Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Created attachment 1814520[details]
reproducer
Description of problem:
I have a reproducer for an old and unrelated gcc bug. With glibc
2.34, the reproducer has started to behave differently than before
and it looks to me like a (glibc?) bug.
I'm attaching the program (rep.c) but the key functionality is:
err = pthread_create(&th, NULL, tf, NULL);
err = pthread_cancel(th);
err = pthread_join(th, NULL);
repeated many times over.
Sometimes pthread_cancel returns ESRCH (this is what's actually
new, I haven't observed these failures before glibc 2.34 but
anyway). When pthread_cancel returns ESRCH then I'd expect the
subsequent pthread_join err with ESRCH, too. However it never
happens in the program.
To sum it up, pthread_cancel looks suspicious to me. It has
started to fail recently and the reported reason (ESRCH) doesn't
get "confirmed" by a subsequent pthread_join call.
Version-Release number of selected component (if applicable):
glibc-2.34-2.el9
gcc-11.2.1-2.2.el9
annobin-9.83-3.el9
kernel-5.14.0-0.rc4.35.el9
Steps to Reproduce:
1. gcc -O2 rep.c -g -o rep -lpthread
2. ./rep 200
Actual results:
Various numbers of the following line:
pthread_cancel failed: No thread with the ID thread could be found
Expected results:
I'd expect either
* no pthread_cancel failures or
* every pthread_cancel failure (err=ESRCH) be followed by
a pthread_join failure (err=ESRCH).
Additional info:
* "Slow" machines tend to produce more occurrences of the problem.
* Not architecture specific
Regression fix is in glibc-upstream-2.34-28.patch, part of glibc-2.34-7.el9:
commit 40bade26d5bcbda3d21fb598c5063d9df62de966
Author: Florian Weimer <fweimer>
Date: Fri Oct 1 18:16:41 2021 +0200
nptl: pthread_kill must send signals to a specific thread [BZ #28407]
Created attachment 1814520 [details] reproducer Description of problem: I have a reproducer for an old and unrelated gcc bug. With glibc 2.34, the reproducer has started to behave differently than before and it looks to me like a (glibc?) bug. I'm attaching the program (rep.c) but the key functionality is: err = pthread_create(&th, NULL, tf, NULL); err = pthread_cancel(th); err = pthread_join(th, NULL); repeated many times over. Sometimes pthread_cancel returns ESRCH (this is what's actually new, I haven't observed these failures before glibc 2.34 but anyway). When pthread_cancel returns ESRCH then I'd expect the subsequent pthread_join err with ESRCH, too. However it never happens in the program. To sum it up, pthread_cancel looks suspicious to me. It has started to fail recently and the reported reason (ESRCH) doesn't get "confirmed" by a subsequent pthread_join call. Version-Release number of selected component (if applicable): glibc-2.34-2.el9 gcc-11.2.1-2.2.el9 annobin-9.83-3.el9 kernel-5.14.0-0.rc4.35.el9 Steps to Reproduce: 1. gcc -O2 rep.c -g -o rep -lpthread 2. ./rep 200 Actual results: Various numbers of the following line: pthread_cancel failed: No thread with the ID thread could be found Expected results: I'd expect either * no pthread_cancel failures or * every pthread_cancel failure (err=ESRCH) be followed by a pthread_join failure (err=ESRCH). Additional info: * "Slow" machines tend to produce more occurrences of the problem. * Not architecture specific