RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1974357 - glibc pthreads updates break helgrind
Summary: glibc pthreads updates break helgrind
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: valgrind
Version: 9.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: beta
: ---
Assignee: Mark Wielaard
QA Contact: Jesus Checa
URL:
Whiteboard:
Depends On: 1958224
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-21 13:10 UTC by Mark Wielaard
Modified: 2021-12-07 21:34 UTC (History)
8 users (show)

Fixed In Version: valgrind-3.17.0-8.el9
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-12-07 21:33:05 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 193369 0 None None None 2021-06-22 07:34:45 UTC
KDE Software Compilation 439590 0 NOR UNCONFIRMED glibc-2.34 breaks suppressions against obj:*/lib*/libc-2.*so* 2021-07-07 10:22:44 UTC
Red Hat Bugzilla 1958224 1 medium CLOSED glibc: Import first glibc 2.34 snapshot into CentOS Stream 2023-07-18 14:29:19 UTC
Red Hat Bugzilla 1975895 1 unspecified CLOSED glibc: Preserve symtab in libc.so.6 and other critical shared objects 2021-07-20 01:09:42 UTC

Internal Links: 1975895

Description Mark Wielaard 2021-06-21 13:10:46 UTC
glibc 2.34 updated how the handle pthread symbols (they are now in the main libc.so.6 file instead of in libpthread.so). This breaks helgrind (and drd).

See also this glibc bug: https://bugzilla.redhat.com/show_bug.cgi?id=1958224

Many helgrind tests fail with:
Helgrind: hg_main.c:5379 (hg_handle_client_request): Assertion 'found' failed.

Which has the following comment:

          /* Can this fail?  It would mean that our pthread_join
             wrapper observed a successful join on args[1] yet that
             thread never existed (or at least, it never lodged an
             entry in the mapping (via SET_MY_PTHREAD_T)).  Which
             sounds like a bug in the threads library. */

Another issue seen is:

WARNING: could not find symbol for var stack_cache_actsize in libpthread.so.0

Which is in ld.so now, GL (dl_stack_cache_actsize).
Part of _rtld_global.

Comment 1 Florian Weimer 2021-06-21 13:17:37 UTC
Please note that glibc 2.34 snapshots are not yet in the buildroot/composes.

Comment 2 Florian Weimer 2021-06-22 07:43:10 UTC
(In reply to Mark Wielaard from comment #0)
> Another issue seen is:
> 
> WARNING: could not find symbol for var stack_cache_actsize in libpthread.so.0
> 
> Which is in ld.so now, GL (dl_stack_cache_actsize).
> Part of _rtld_global.

If we need to disable the stack cache when running under valgrind, we can put that logic into glibc, instead of having valgrind patch glibc internals. (I understand there is a way for us to detect whether we are running under valgrind.)

What do you think?

Comment 3 Mark Wielaard 2021-06-22 12:34:21 UTC
(In reply to Florian Weimer from comment #2)
> (In reply to Mark Wielaard from comment #0)
> > Another issue seen is:
> > 
> > WARNING: could not find symbol for var stack_cache_actsize in libpthread.so.0
> > 
> > Which is in ld.so now, GL (dl_stack_cache_actsize).
> > Part of _rtld_global.
> 
> If we need to disable the stack cache when running under valgrind, we can
> put that logic into glibc, instead of having valgrind patch glibc internals.
> (I understand there is a way for us to detect whether we are running under
> valgrind.)
> 
> What do you think?

A supported way to do this would be great. The current documentation in valgrind reads:

/* glibc nptl pthread systems only, when no-nptl-pthread-stackcache
   was given in --sim-hints.
   Used for a (kludgy) way to disable the cache of stacks as implemented in
   nptl glibc. 
   Based on internal knowledge of the pthread glibc nptl/allocatestack.c code:
   a huge value in stack_cache_actsize (bigger than the constant
   stack_cache_maxsize) makes glibc believes the cache is full
   and so stacks are always released when a pthread terminates.
   Several ugliness in this kludge:
    * hardcodes private glibc var name "stack_cache_maxsize"
    * based on knowledge of the code of the functions
      queue_stack and __free_stacks
    * static symbol for "stack_cache_maxsize" must be in
      the debug info.
   It would be much cleaner to have a documented and supported
   way to disable the pthread stack cache. */

But note that this is only done when the no-nptl-pthread-stackcache SimHint is given, which isn't the default (but is enabled for some tests).

To detect whether running under valgrind you can use the RUNNING_ON_VALGRIND macro defined in valgrind.h.

Comment 5 IBM Bug Proxy 2021-06-22 17:01:02 UTC
------- Comment From arnez.com 2021-06-22 12:56 EDT-------
FWIW, I just tested Valgrind on RHEL9 on s390x with a patched glibc, using Florian's patch for glibc-2.33-18.fc34.src.rpm from this Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1965374

Stefan Liebler thankfully adjusted the patches to RHEL9.  Using this changed glibc, the fails from helgrind/tests disappear, except for pth_destroy_cond, which still fails.

Comment 7 IBM Bug Proxy 2021-06-23 12:41:03 UTC
------- Comment From STLI.com 2021-06-23 08:38 EDT-------
@Florian Weimer:
I've just got a report that starting with glibc-2.33-18.fc34, perf fails to add a probe as it can't find the symbol, e.g. for inet_pton:
perf probe -n -vvvvv -x /usr/lib64/libc-2.33.so -a inet_pton

But it succeeds for global symbols in ".symtab" of /usr/lib64/libc-2.33.so like e.g.
perf probe -n -vvvvv -x /usr/lib64/libc-2.33.so -a __memcpy_chk

Note: /usr/lib64/libc-2.33.so has a ".symtab" symbol-table with the symbols discussed here:
https://bugzilla.redhat.com/show_bug.cgi?id=1965374#c11
which match the "strip -K xyz" invocation in wrap-find-debuginfo.sh in the glibc-source-package.

Compared to the older 2.33-8.fc34 package, /usr/lib64/libc-2.33.so does not have a ".symtab" symbol table and I assume perf is then searching the symbol in the libc.so from the glibc-debuginfo package. There it finds inet_pton.

Can you please have a look?

Comment 8 Florian Weimer 2021-06-23 12:55:41 UTC
(In reply to IBM Bug Proxy from comment #7)
> ------- Comment From STLI.com 2021-06-23 08:38 EDT-------
> @Florian Weimer:
> I've just got a report that starting with glibc-2.33-18.fc34, perf fails to
> add a probe as it can't find the symbol, e.g. for inet_pton:
> perf probe -n -vvvvv -x /usr/lib64/libc-2.33.so -a inet_pton
> 
> But it succeeds for global symbols in ".symtab" of /usr/lib64/libc-2.33.so
> like e.g.
> perf probe -n -vvvvv -x /usr/lib64/libc-2.33.so -a __memcpy_chk

This looks like a perf deficiency. According to the manual page, it support dynamic symbols.

However, I think the general expectation is that perf probes require the installation of debuginfo packages.

It maybe worthwhile to open a separate bug report for this.

Comment 9 IBM Bug Proxy 2021-06-23 14:21:08 UTC
------- Comment From TMRICHT.com 2021-06-23 10:10 EDT-------
@Florian Weimer:

It is not related to missing debug info of glibc, it is installed as you can see:

[root@f34 ~]# perf probe -f -x /usr/lib64/libc-2.33.so -a inet_pton
Probe point 'inet_pton' not found.
Error: Failed to add events.
[root@f34 ~]# rpm -qa | fgrep glibc
glibc-all-langpacks-2.33-18.fc34.x86_64
glibc-common-2.33-18.fc34.x86_64
glibc-langpack-en-2.33-18.fc34.x86_64
glibc-2.33-18.fc34.x86_64
glibc-doc-2.33-18.fc34.noarch
glibc-headers-x86-2.33-18.fc34.noarch
glibc-devel-2.33-18.fc34.x86_64
glibc-debugsource-2.33-18.fc34.x86_64
glibc-debuginfo-2.33-18.fc34.x86_64
[root@f34 ~]#

The symbol inet_pton is now in the .dynsym section of glibc:
[root@f34 ~]# readelf -sW /usr/lib64/libc-2.33.so | egrep '(dynsym|symtab|inet_pton)'
Symbol table '.dynsym' contains 2419 entries:
628: 000000000011ea00   108 FUNC    WEAK   DEFAULT   15 inet_pton@@GLIBC_2.2.5
2251: 000000000011e9b0    76 FUNC    GLOBAL DEFAULT   15 __inet_pton_length@@GLIBC_PRIVATE
Symbol table '.symtab' contains 104 entries:
[root@f34 ~]#

Now perf does not find it. In the older version of the library the
symbol inet_pton was listed in the .symtab section

[root@m35lp76 ~]# rpm -qa | fgrep glibc
glibc-common-2.32-4.fc33.s390x
glibc-langpack-en-2.32-4.fc33.s390x
glibc-2.32-4.fc33.s390x
glibc-headers-s390-2.32-4.fc33.noarch
glibc-devel-2.32-4.fc33.s390x
glibc-debuginfo-common-2.32-4.fc33.s390x
glibc-debuginfo-2.32-4.fc33.s390x
[root@m35lp76 ~]#

readelf -sW /usr/lib64/libc-2.32.so | egrep '(dynsym|symtab|inet_pton)'
Symbol table '.dynsym' contains 2604 entries:
668: 00000000001444b0   788 FUNC    WEAK   DEFAULT   13 inet_pton@@GLIBC_2.2
2418: 00000000001441b0   764 FUNC    GLOBAL DEFAULT   13 __inet_pton_length@@GLIBC_PRIVATE
Symbol table '.symtab' contains 28858 entries:
20655: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS inet_pton.c
20656: 00000000001440b0     0 NOTYPE  LOCAL  HIDDEN    13 .annobin_inet_pton.c
20657: 00000000001447c4     0 NOTYPE  LOCAL  HIDDEN    13 .annobin_inet_pton.c_end
20658: 000000000002ba70     0 NOTYPE  LOCAL  HIDDEN    13 .annobin_inet_pton.c.hot
20659: 000000000002ba70     0 NOTYPE  LOCAL  HIDDEN    13 .annobin_inet_pton.c_end.hot
20660: 000000000002b938     0 NOTYPE  LOCAL  HIDDEN    13 .annobin_inet_pton.c.unlikely
20661: 000000000002b938     0 NOTYPE  LOCAL  HIDDEN    13 .annobin_inet_pton.c_end.unlikely
20662: 000000000002ba70     0 NOTYPE  LOCAL  HIDDEN    13 .annobin_inet_pton.c.startup
20663: 000000000002ba70     0 NOTYPE  LOCAL  HIDDEN    13 .annobin_inet_pton.c_end.startup
20664: 000000000002b968     0 NOTYPE  LOCAL  HIDDEN    13 .annobin_inet_pton.c.exit
20665: 000000000002b968     0 NOTYPE  LOCAL  HIDDEN    13 .annobin_inet_pton.c_end.exit
20666: 00000000001440b0     0 NOTYPE  LOCAL  HIDDEN    13 .annobin_inet_pton4.start
20667: 00000000001441aa     0 NOTYPE  LOCAL  HIDDEN    13 .annobin_inet_pton4.end
20668: 00000000001440b0   250 FUNC    LOCAL  DEFAULT   13 inet_pton4
20669: 00000000001441aa     0 NOTYPE  LOCAL  HIDDEN    13 .annobin___GI___inet_pton_length.start
20670: 00000000001444ac     0 NOTYPE  LOCAL  HIDDEN    13 .annobin___GI___inet_pton_length.end
20671: 00000000001444ac     0 NOTYPE  LOCAL  HIDDEN    13 .annobin___GI___inet_pton.start
20672: 00000000001447c4     0 NOTYPE  LOCAL  HIDDEN    13 .annobin___GI___inet_pton.end
23591: 00000000001441b0   764 FUNC    LOCAL  DEFAULT   13 __GI___inet_pton_length
23871: 00000000001444b0   788 FUNC    LOCAL  DEFAULT   13 __inet_pton
24506: 00000000001444b0   788 FUNC    LOCAL  DEFAULT   13 __GI___inet_pton
25831: 00000000001444b0   788 FUNC    LOCAL  DEFAULT   13 __GI_inet_pton
26410: 00000000001441b0   764 FUNC    GLOBAL DEFAULT   13 __inet_pton_length
27288: 00000000001444b0   788 FUNC    WEAK   DEFAULT   13 inet_pton
[root@m35lp76 ~]#

And perf could find the symbol, extract its address and install a probe
on that address.

Comment 10 Florian Weimer 2021-06-24 16:36:23 UTC
I filed bug 1975895 for considering restoring .symtab in toto.

Comment 14 Mark Wielaard 2021-07-06 16:56:16 UTC
One thing to be aware of with glibc 2.34 is (from the glibc NEWS file):

* Previously, glibc installed its various shared objects under versioned
  file names such as libc-2.33.so.  The ABI sonames (e.g., libc.so.6)
  were provided as symbolic links.  Starting with glibc 2.34, the shared
  objects are installed under their ABI sonames directly, without
  symbolic links.  This increases compatibility with distribution
  package managers that delete removed files late during the package
  upgrade or downgrade process.

This can impact valgrind suppressions that use obj:*/lib*/libc-2.*so*
Those won't match anymore because the file name seen in the process memory map.
It will be /usr/lib64/libc.so.6 instead of /usr/lib64/ld-2.34.so.


Note You need to log in before you can comment on or make changes to this bug.