Bug 1994068
| Summary: | glibc: pthread_cancel fails with ESRCH yet subsequent pthread_join passes | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Václav Kadlčík <vkadlcik> | ||||
| Component: | glibc | Assignee: | Florian Weimer <fweimer> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | qe-baseos-tools-bugs | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 9.0 | CC: | ashankar, codonell, dj, fweimer, mnewsome, pfrankli, sipoyare | ||||
| Target Milestone: | beta | Keywords: | Bugfix, Patch, Triaged | ||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | glibc-2.34-7.el9 | Doc Type: | No Doc Update | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2021-10-21 13:30:21 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1994653 | ||||||
| Bug Blocks: | |||||||
| Attachments: |
|
||||||
Nice catch. The changed pthread_cancel implementation in glibc 2.34 triggers a known bug in pthread_kill. Patches posted upstream: https://sourceware.org/pipermail/libc-alpha/2021-August/130207.html Upstream patches have been committed (also to the 2.34 release branch). There's been a regression by the most recent fix, so we need to respin this one. Regression fix is in glibc-upstream-2.34-28.patch, part of glibc-2.34-7.el9:
commit 40bade26d5bcbda3d21fb598c5063d9df62de966
Author: Florian Weimer <fweimer>
Date: Fri Oct 1 18:16:41 2021 +0200
nptl: pthread_kill must send signals to a specific thread [BZ #28407]
|
Created attachment 1814520 [details] reproducer Description of problem: I have a reproducer for an old and unrelated gcc bug. With glibc 2.34, the reproducer has started to behave differently than before and it looks to me like a (glibc?) bug. I'm attaching the program (rep.c) but the key functionality is: err = pthread_create(&th, NULL, tf, NULL); err = pthread_cancel(th); err = pthread_join(th, NULL); repeated many times over. Sometimes pthread_cancel returns ESRCH (this is what's actually new, I haven't observed these failures before glibc 2.34 but anyway). When pthread_cancel returns ESRCH then I'd expect the subsequent pthread_join err with ESRCH, too. However it never happens in the program. To sum it up, pthread_cancel looks suspicious to me. It has started to fail recently and the reported reason (ESRCH) doesn't get "confirmed" by a subsequent pthread_join call. Version-Release number of selected component (if applicable): glibc-2.34-2.el9 gcc-11.2.1-2.2.el9 annobin-9.83-3.el9 kernel-5.14.0-0.rc4.35.el9 Steps to Reproduce: 1. gcc -O2 rep.c -g -o rep -lpthread 2. ./rep 200 Actual results: Various numbers of the following line: pthread_cancel failed: No thread with the ID thread could be found Expected results: I'd expect either * no pthread_cancel failures or * every pthread_cancel failure (err=ESRCH) be followed by a pthread_join failure (err=ESRCH). Additional info: * "Slow" machines tend to produce more occurrences of the problem. * Not architecture specific