Bug 146911 - Thread suspension via async signal fails on rhel4-rc2
Summary: Thread suspension via async signal fails on rhel4-rc2
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
(Show other bugs)
Version: 4.0
Hardware: x86_64 Linux
Target Milestone: ---
: ---
Assignee: Ingo Molnar
QA Contact: Brian Brock
Depends On:
TreeView+ depends on / blocked
Reported: 2005-02-02 18:10 UTC by David Simms
Modified: 2007-11-30 22:07 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2005-06-08 15:13:42 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Repro (8.27 KB, text/plain)
2005-02-02 18:12 UTC, David Simms
no flags Details
Patch adding the missing "lock" prefix (566 bytes, patch)
2005-02-07 21:11 UTC, Suresh Siddha
no flags Details | Diff

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2005:420 normal SHIPPED_LIVE Important: Updated kernel packages available for Red Hat Enterprise Linux 4 Update 1 2005-06-08 04:00:00 UTC

Description David Simms 2005-02-02 18:10:57 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5)
Gecko/20041107 Firefox/1.0

Description of problem:
When using pthread_kill and signal handling to perform thread
suspension we get an unexplained dead-lock. Happens for both ia32 and
x86_64 compiled code.


Suspender thread runs...

// ... ensure suspendee running...
pthread_kill(suspendee, suspendSignal)

While suspendee threads run...

   while (notSuspendedEnough())

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
Will attach repro in which the main thread signals a number of threads
whom acknowledge then wait until signalled again.

1. gcc -g -Wall -lpthread -o susphello susphello.c
2. ./susphello
3. Wait less than a minute for it to lock up

Actual Results:  Deadlocks after a random amount of time, normal less
than 10 seconds. No doubt h/w dependent, was using a two-way with

Upon deadlock the thread we are waiting for shows the signal is
pending (via procfs/ps) and both procfs and gdb show the thread is in
a system call (or least boundary). WCHAN shows "-" and "sys-rq trace"
shows RUNNING (user code).

Expected Results:  The suspendee should receive the suspend signal and
acknowledge, with either sem_post or pthread_kill (defined in test case)

Additional info:

uname: 2.6.9-1.906_ELsmp #1 SMP Sun Dec 12 23:05:02 EST 2004 x86_64
x86_64 x86_64 GNU/Linux

rpm -q --queryformat '\n%{NAME}-%{VERSION}-%{RELEASE}.%{ARCH}\n' glibc:


Comment 1 David Simms 2005-02-02 18:12:17 UTC
Created attachment 110564 [details]

gcc -g -Wall -lpthread -o susphello susphello.c && ./susphello

Comment 2 Jay Turner 2005-02-03 08:12:31 UTC
I'm not able to reproduce on my HT IA32 box, but am able to reproduce readily on
4-way x86_64 (EM64T) box.  Both boxes are running kernel-2.6.9-5.EL and

Another bit of data is that transferring the 32-bit susphello to the x86_64
machine and running that results in the lock as well.

Comment 5 Suresh Siddha 2005-02-07 21:11:27 UTC
Created attachment 110755 [details]
Patch adding the missing "lock" prefix

Attached patch seems to fix the issue. Will post the patch to upstream kernel

Comment 8 Johan Walles 2005-02-11 17:18:34 UTC
I have verified that this patch resolves the problem demonstrated by
the repro case.  Thanks, Suresh.

Comment 13 Bob Johnson 2005-03-01 17:39:02 UTC
Folks at BEA, this is slated for inclusion in U1 Beta.
Please reply with your testing of this particular item once we make
the Beta available to you, thanks.

Comment 15 Tim Powers 2005-06-08 15:13:43 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.