The pthreads library in glibc 2.1 uses only one "next"
pointer in the thread descriptor -- with the assumption
that a thread will only be queued for one thing at a time.
The CV (condition variable) code, however, uses a queue for
threads waiting on a CV, and uses a lock for synchronizing
this queue. The lock uses a queue when there is
contention, so a thread can end up sitting on a CV queue,
and attempting to get a lock (to remove itself from the CV
queue), but queueing on the lock.
The URL points to a program (swarm.c) that will duplicate
the bug on 2-CPU machines (tweak the #define's for a 4-way)
by creating 20 threads that keep waiting 1 second on a
single CV. Every second, all 20 threads dequeue from the
CV and then requeue.
The URL also points to a patch (patch.glibc) for the
linuxthreads library to fix this bug. It adds a second
"next" pointer to the thread descriptor so that a thread
can be queued for some event (like a CV) and also queue on
a lock for internal synchronization.
I have also emailed this patch to Ulrich Drepper
<drepper> of the glibc team.
since you can't see the url in the normal bug printout (why?), i'll
echo it here:
Fixed in glibc-2.1.2-1, available from rawhide shortly