RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1994068 - glibc: pthread_cancel fails with ESRCH yet subsequent pthread_join passes
Summary: glibc: pthread_cancel fails with ESRCH yet subsequent pthread_join passes
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: glibc
Version: 9.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: beta
: ---
Assignee: Florian Weimer
QA Contact: qe-baseos-tools-bugs
URL:
Whiteboard:
Depends On: 1994653
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-16 16:03 UTC by Václav Kadlčík
Modified: 2023-07-18 14:29 UTC (History)
7 users (show)

Fixed In Version: glibc-2.34-7.el9
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-21 13:30:21 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
reproducer (2.43 KB, text/plain)
2021-08-16 16:03 UTC, Václav Kadlčík
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-93744 0 None None None 2021-08-17 06:26:39 UTC
Sourceware 12889 0 None None None 2021-08-17 06:26:07 UTC
Sourceware 19193 0 None None None 2021-08-17 06:24:17 UTC
Sourceware 28407 0 P2 ASSIGNED pthread_kill assumes that kill (getpid ()) is equivalent to tgkill (getpid (), gettid()) 2021-10-01 14:14:21 UTC

Description Václav Kadlčík 2021-08-16 16:03:59 UTC
Created attachment 1814520 [details]
reproducer

Description of problem:

I have a reproducer for an old and unrelated gcc bug. With glibc
2.34, the reproducer has started to behave differently than before
and it looks to me like a (glibc?) bug.

I'm attaching the program (rep.c) but the key functionality is:

  err = pthread_create(&th, NULL, tf, NULL);
  err = pthread_cancel(th);
  err = pthread_join(th, NULL);

repeated many times over.

Sometimes pthread_cancel returns ESRCH (this is what's actually
new, I haven't observed these failures before glibc 2.34 but
anyway). When pthread_cancel returns ESRCH then I'd expect the
subsequent pthread_join err with ESRCH, too. However it never
happens in the program.

To sum it up, pthread_cancel looks suspicious to me. It has
started to fail recently and the reported reason (ESRCH) doesn't
get "confirmed" by a subsequent pthread_join call.


Version-Release number of selected component (if applicable):

glibc-2.34-2.el9
gcc-11.2.1-2.2.el9
annobin-9.83-3.el9
kernel-5.14.0-0.rc4.35.el9


Steps to Reproduce:

1. gcc -O2 rep.c -g -o rep -lpthread
2. ./rep 200


Actual results:

Various numbers of the following line:
pthread_cancel failed: No thread with the ID thread could be found


Expected results:

I'd expect either
  * no pthread_cancel failures or
  * every pthread_cancel failure (err=ESRCH) be followed by
    a pthread_join failure (err=ESRCH).


Additional info:

* "Slow" machines tend to produce more occurrences of the problem.
* Not architecture specific

Comment 2 Florian Weimer 2021-08-17 06:24:18 UTC
Nice catch. The changed pthread_cancel implementation in glibc 2.34 triggers a known bug in pthread_kill.

Comment 3 Florian Weimer 2021-08-17 13:51:56 UTC
Patches posted upstream: https://sourceware.org/pipermail/libc-alpha/2021-August/130207.html

Comment 5 Florian Weimer 2021-09-13 12:20:30 UTC
Upstream patches have been committed (also to the 2.34 release branch).

Comment 7 Florian Weimer 2021-10-01 14:14:21 UTC
There's been a regression by the most recent fix, so we need to respin this one.

Comment 8 Florian Weimer 2021-10-21 13:30:21 UTC
Regression fix is in glibc-upstream-2.34-28.patch, part of glibc-2.34-7.el9:

    commit 40bade26d5bcbda3d21fb598c5063d9df62de966
    Author: Florian Weimer <fweimer>
    Date:   Fri Oct 1 18:16:41 2021 +0200
    
        nptl: pthread_kill must send signals to a specific thread [BZ #28407]


Note You need to log in before you can comment on or make changes to this bug.