Created attachment 1814520 [details] reproducer Description of problem: I have a reproducer for an old and unrelated gcc bug. With glibc 2.34, the reproducer has started to behave differently than before and it looks to me like a (glibc?) bug. I'm attaching the program (rep.c) but the key functionality is: err = pthread_create(&th, NULL, tf, NULL); err = pthread_cancel(th); err = pthread_join(th, NULL); repeated many times over. Sometimes pthread_cancel returns ESRCH (this is what's actually new, I haven't observed these failures before glibc 2.34 but anyway). When pthread_cancel returns ESRCH then I'd expect the subsequent pthread_join err with ESRCH, too. However it never happens in the program. To sum it up, pthread_cancel looks suspicious to me. It has started to fail recently and the reported reason (ESRCH) doesn't get "confirmed" by a subsequent pthread_join call. Version-Release number of selected component (if applicable): glibc-2.34-2.el9 gcc-11.2.1-2.2.el9 annobin-9.83-3.el9 kernel-5.14.0-0.rc4.35.el9 Steps to Reproduce: 1. gcc -O2 rep.c -g -o rep -lpthread 2. ./rep 200 Actual results: Various numbers of the following line: pthread_cancel failed: No thread with the ID thread could be found Expected results: I'd expect either * no pthread_cancel failures or * every pthread_cancel failure (err=ESRCH) be followed by a pthread_join failure (err=ESRCH). Additional info: * "Slow" machines tend to produce more occurrences of the problem. * Not architecture specific
Nice catch. The changed pthread_cancel implementation in glibc 2.34 triggers a known bug in pthread_kill.
Patches posted upstream: https://sourceware.org/pipermail/libc-alpha/2021-August/130207.html
Upstream patches have been committed (also to the 2.34 release branch).
There's been a regression by the most recent fix, so we need to respin this one.
Regression fix is in glibc-upstream-2.34-28.patch, part of glibc-2.34-7.el9: commit 40bade26d5bcbda3d21fb598c5063d9df62de966 Author: Florian Weimer <fweimer> Date: Fri Oct 1 18:16:41 2021 +0200 nptl: pthread_kill must send signals to a specific thread [BZ #28407]