Bug 87656 - LTC2324-Thread improperly loses lock on mutex when thread is cancelled.
LTC2324-Thread improperly loses lock on mutex when thread is cancelled.
Status: CLOSED ERRATA
Product: Red Hat Linux
Classification: Retired
Component: glibc (Show other bugs)
8.0
i686 Linux
high Severity high
: ---
: ---
Assigned To: Jakub Jelinek
Brian Brock
:
: 87704 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2003-03-31 17:31 EST by IBM Bug Proxy
Modified: 2016-11-24 09:56 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2003-04-10 19:09:24 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description IBM Bug Proxy 2003-03-31 17:31:45 EST
A thread's lock of a mutex appears to be released when thread is cancelled,
violating definition of how pthreads should work.

If a thread could be cancelled while holding a lock on a mutex, it is the
application's responsibility to unlock the mutex in a thread cancelation
cleanup handler.  I have product code that does this.  That code no longer
works correctly on Redhat 8.0, with glibc-2.3.2-4.80.

Hardware Environment:

xSeries, but I don't think that's important.

Software Environment:

Redhat 8.0, with glibc 2.3.2-4.80, as shown below:

  # rpm -qa | grep glibc
  glibc-2.3.2-4.80
  glibc-kernheaders-2.4-7.20
  glibc-common-2.3.2-4.80
  glibc-devel-2.3.2-4.80
  glibc-debug-2.3.2-4.80


Steps to Reproduce:

1. Compile the following test program, named tc1.c: "make CFLAGS=-lpthread tc1"

----- test program beginning ----

#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <assert.h>


#define NUM_THREADS     5
#define LIFETIME_SECS   1


static pthread_mutexattr_t  the_mutex_attr;
static pthread_mutex_t      the_mutex;


void thread_rtn_cleanup(void *arg_p)
{
    pthread_t me;
    int       rc;

    me = pthread_self();

    rc = pthread_mutex_unlock(&the_mutex);

    if (rc != 0) {
        fprintf(stderr, "%d: thread_rtn_cleanup: pthread_mutex_unlock() "
                "returned %d.\n", me, rc);
    }

    return;
}


void *thread_rtn(void *arg)
{
    pthread_t me;
    int       rc;


    me = pthread_self();

    fprintf(stderr, "Thread %d created.\n", me);

    pthread_cleanup_push(thread_rtn_cleanup, NULL);

    while (1) {
        rc = pthread_mutex_lock(&the_mutex);
        assert(rc == 0);

        pthread_testcancel();   /* force the issue of thread cancellation   */

        rc = pthread_mutex_unlock(&the_mutex);

        if (rc != 0) {
            fprintf(stderr, "%d: thread_rtn: pthread_mutex_unlock() "
                    "returned %d.\n", me, rc);
        }
    }

    pthread_cleanup_pop(0);

    return NULL;
}


int main(int argc, char **argv)
{
    pthread_t thread_ids[NUM_THREADS];
    int t, rc;


    rc = pthread_mutexattr_init(&the_mutex_attr);
    if (rc != 0) {
        fprintf(stderr, "Error initializing mutex attribute: (%d) %s\n",
                rc, strerror(rc));
        exit(1);
    }

    rc = pthread_mutexattr_settype(&the_mutex_attr,
                                   PTHREAD_MUTEX_ERRORCHECK_NP);
    if (rc != 0) {
        fprintf(stderr, "Error setting type in mutex attribute: (%d) %s\n",
                rc, strerror(rc));
        exit(1);
    }

    rc = pthread_mutex_init(&the_mutex, &the_mutex_attr);
    if (rc != 0) {
        fprintf(stderr, "Error initializing mutex: (%d) %s\n",
                rc, strerror(rc));
        exit(1);
    }

    for (t = 0; t < NUM_THREADS; t++) {
        rc = pthread_create(&thread_ids[t], NULL, thread_rtn, NULL);
        if (rc != 0) {
            fprintf(stderr, "Error creating thread number %d: (%d) %s\n",
                    t + 1, rc, strerror(rc));
            exit(1);
        }
    }

    sleep(LIFETIME_SECS);

    for (t = 0; t < NUM_THREADS; t++) {
        rc = pthread_cancel(thread_ids[t]);
        if (rc != 0) {
            fprintf(stderr, "Error cancelling thread number %d: (%d) %s\n",
                    t + 1, rc, strerror(rc));
            exit(1);
        }
    }

    for (t = 0; t < NUM_THREADS; t++) {
        rc = pthread_join(thread_ids[t], NULL);
        if (rc != 0) {
            fprintf(stderr, "Error joining thread number %d: (%d) %s\n",
                    t + 1, rc, strerror(rc));
            exit(1);
        }
    }

    return 0;
}

----- test program end ----

2. Consider what the program does.  It is the skeleton of some program that
   uses a mutex to protect critical sections of the code.  The main thread
   creates 5 threads, lets them run for a while, and then cancels and joins
   them.  The created threads serialize with each other using the mutex.  The
   threads reach a cancelation point while holding a lock on the mutex, due to
   the call to pthread_testcancel().  The threads push a cancelation cleanup
   handler.  The cancelation cleanup handler must unlock the mutex.

   Notice that the mutex was created with the PTHREAD_MUTEX_ERRORCHECK_NP
   attribute.  This causes the pthreads library to look for errors, such as a
   thread attempting to unlock a mutex on which it does not have a lock.

3. Run the program.  When the program is run, the pthread_mutex_unlock() call
   in the thread cancelation cleanup handler, thread_rtn_cleanup(), returns
   error code 1, EPERM.  Coming from pthread_mutex_unlock(), this code means
   the thread did not have a lock on the mutex.

4. Realize that the results in 3 is a problem.  The pthreads library is not
   supposed to automatically cause the mutex to be unlocked.

5. Note that this program runs correctly (no errors are reported) on Redhat
   7.x.

I consider this problem severe.  The problem can lead to critical code
sections no longer being protected.


------- Additional Comment #1 From Eric M. Agar 2003-03-28 16:00 -------

Here's the output I see when I run the test program:

[root@rsctcl70 agar2]# ./tc1
Thread 16386 created.
Thread 32771 created.
Thread 49156 created.
Thread 65541 created.
Thread 81926 created.
32771: thread_rtn_cleanup: pthread_mutex_unlock() returned 1.
81926: thread_rtn_cleanup: pthread_mutex_unlock() returned 1.
65541: thread_rtn_cleanup: pthread_mutex_unlock() returned 1.
16386: thread_rtn_cleanup: pthread_mutex_unlock() returned 1.
49156: thread_rtn_cleanup: pthread_mutex_unlock() returned 1.


------- Additional Comment #2 From Eric M. Agar 2003-03-28 16:04 -------

The following program also shows the problem, without relying on the pthreads 
library to report the problem.

#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>


#define NUM_GROUPS      10
#define NUM_THREADS     10
#define LIFETIME_SECS   1
#define DELAY_USECS     1


static pthread_mutex_t  the_mutex = PTHREAD_MUTEX_INITIALIZER;
static pthread_t        the_tid = 0;


void thread_rtn_cleanup(void *arg_p)
{
    pthread_t       me, actual, expect;


    me = pthread_self();

    expect = me;
    actual = the_tid;

    the_tid = 0;

    pthread_mutex_unlock(&the_mutex);

    if (actual != expect) {
        fprintf(stderr, "thread_rtn_cleanup: actual (%d) != expected (%d).\n",
                actual, expect);
    }

    return;
}


void *thread_rtn(void *arg)
{
    pthread_t       me, actual, expect;


    me = pthread_self();

    fprintf(stderr, "Thread %d created.\n", me);

    pthread_cleanup_push(thread_rtn_cleanup, NULL);

    while (1) {
        pthread_mutex_lock(&the_mutex);

        expect = 0;
        actual = the_tid; 

        the_tid = me;

        usleep(DELAY_USECS);    /* supposed to be thread cancellation point */
        pthread_testcancel();   /* force the issue of thread cancellation   */

        expect = me;
        actual = the_tid;

        the_tid = 0;

        pthread_mutex_unlock(&the_mutex);

        if (actual != expect) {
            fprintf(stderr, "thread_rtn: actual (%d) != expected (%d).\n",
                    actual, expect);
        }
    }

    pthread_cleanup_pop(0);

    return NULL;
}


int main(int argc, char **argv)
{
    int flag;
    int g, t;
    pthread_t thread_ids[NUM_THREADS];
    int rc;

    for (g = 0; g < NUM_GROUPS; g++) {

        for (t = 0; t < NUM_THREADS; t++) {
            rc = pthread_create(&thread_ids[t], NULL, thread_rtn, NULL);
            if (rc != 0) {
                fprintf(stderr, "Error creating thread number %d: (%d) %s\n",
                        t + 1, rc, strerror(rc));
                exit(1);
            }
        }

        sleep(LIFETIME_SECS);

        for (t = 0; t < NUM_THREADS; t++) {
            rc = pthread_cancel(thread_ids[t]);
            if (rc != 0) {
                fprintf(stderr, "Error cancelling thread number %d: (%d) %s\n",
                        t + 1, rc, strerror(rc));
                exit(1);
            }
        }

        for (t = 0; t < NUM_THREADS; t++) {
            rc = pthread_join(thread_ids[t], NULL);
            if (rc != 0) {
                fprintf(stderr, "Error joining thread number %d: (%d) %s\n",
                        t + 1, rc, strerror(rc));
                exit(1);
            }
        }

        fprintf(stderr, "Completed group %d.\n", g + 1);
    }

    return 0;
}


The type of output I see when I run it is:

Thread 1490946 created.
Thread 1507331 created.
Thread 1523716 created.
Thread 1540101 created.
Thread 1556486 created.
Thread 1572871 created.
Thread 1589256 created.
Thread 1605641 created.
Thread 1622026 created.
Thread 1638411 created.
thread_rtn_cleanup: actual (1589256) != expected (1490946).
thread_rtn_cleanup: actual (0) != expected (1507331).
thread_rtn_cleanup: actual (0) != expected (1523716).
thread_rtn_cleanup: actual (0) != expected (1540101).
thread_rtn_cleanup: actual (0) != expected (1556486).
thread_rtn_cleanup: actual (0) != expected (1572871).
thread_rtn_cleanup: actual (0) != expected (1589256).
thread_rtn_cleanup: actual (0) != expected (1605641).
thread_rtn_cleanup: actual (0) != expected (1622026).
thread_rtn_cleanup: actual (0) != expected (1638411).
Completed group 10.


------- Additional Comment #3 From Khoa D. Huynh 2003-03-28 17:18 -------

Salina - please look at this problem for me....Thanks.


------- Additional Comment #4 From Salina Chu 2003-03-31 11:59 -------

Hi,

Trying to re-create your problem ...

After installing RedHat 8.0,
   uname -a shows
Linux ltcvpwld.ltc.austin.ibm.com 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 
i686 i386 GNU/Linux
  
   rpm -qa | grep glibc shows

glibc-2.2.93-5
glibc-common-2.2.93-5
glibc-devel-2.2.93-5
glibc-kernheaders-2.4-7.20

[root@ltcvpwld /]# rpm -qa | grep kernel
kernel-2.4.18-14
kernel-pcmcia-cs-3.1.31-9
kernel-source-2.4.18-14


I compiled and executed both your test programs and did not see any problems.
I have a UP netvista machine.

Running 1st tcl program
Thread 8194 created.
Thread 16387 created.
Thread 24580 created.
Thread 32773 created.
Thread 40966 created.

Running 2nd tcl program
Thread 8194 created.
Thread 16387 created.
Thread 24580 created.
Thread 32773 created.
Thread 40966 created.
Thread 49159 created.
Thread 57352 created.
Thread 65545 created.
Thread 73738 created.
Thread 81931 created.
Completed group 1.
Thread 90114 created.
Thread 98307 created.
Thread 106500 created.
Thread 114693 created.
Thread 122886 created.
Thread 131079 created.
Thread 139272 created.
Thread 147465 created.
Thread 155658 created.
Thread 163851 created.
Completed group 2.
Thread 172034 created.
Thread 180227 created.
Thread 188420 created.
Thread 196613 created.
Thread 204806 created.
Thread 212999 created.
Thread 221192 created.
Thread 229385 created.
Thread 237578 created.
Thread 245771 created.
Completed group 3.
Thread 253954 created.
Thread 262147 created.
Thread 270340 created.
Thread 278533 created.
Thread 286726 created.
Thread 294919 created.
Thread 303112 created.
Thread 311305 created.
Thread 319498 created.
Thread 327691 created.
Completed group 4.
Thread 335874 created.
Thread 344067 created.
Thread 352260 created.
Thread 360453 created.
Thread 368646 created.
Thread 376839 created.
Thread 385032 created.
Thread 393225 created.
Thread 401418 created.
Thread 409611 created.
Completed group 5.
Thread 417794 created.
Thread 425987 created.
Thread 434180 created.
Thread 442373 created.
Thread 450566 created.
Thread 458759 created.
Thread 466952 created.
Thread 475145 created.
Thread 483338 created.
Thread 491531 created.
Completed group 6.
Thread 499714 created.
Thread 507907 created.
Thread 516100 created.
Thread 524293 created.
Thread 532486 created.
Thread 540679 created.
Thread 548872 created.
Thread 557065 created.
Thread 565258 created.
Thread 573451 created.
Completed group 7.
Thread 581634 created.
Thread 589827 created.
Thread 598020 created.
Thread 606213 created.
Thread 614406 created.
Thread 622599 created.
Thread 630792 created.
Thread 638985 created.
Thread 647178 created.
Thread 655371 created.
Completed group 8.
Thread 663554 created.
Thread 671747 created.
Thread 679940 created.
Thread 688133 created.
Thread 696326 created.
Thread 704519 created.
Thread 712712 created.
Thread 720905 created.
Thread 729098 created.
Thread 737291 created.
Completed group 9.
Thread 745474 created.
Thread 753667 created.
Thread 761860 created.
Thread 770053 created.
Thread 778246 created.
Thread 786439 created.
Thread 794632 created.
Thread 802825 created.
Thread 811018 created.
Thread 819211 created.
Completed group 10.
  
On the RedHat updates web site, for RedHat 8.0,
there are glibc-2.3.2-4.80 rpms and kernel 2.4.18-27.8.0 rpms.
Are the updates you are having problems with ?
You did not tell us which kernel version you are using, are you using the 
default RedHat 8.0 kernel or the updated kernel ?
Please show the uname -a and rpm -qa | grep kernel output so we know which 
version you are using.
Also is your machine UP or MP ?

Thanks

Salina


    


------- Additional Comment #5 From Eric M. Agar 2003-03-31 13:06 -------

Here's the info. you requested.  The machine has the glibc updates, but not
the kernel updates.

  # uname -a
  Linux rsctcl70.pok.ibm.com 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 
i386 GNU/Linux
  
  # rpm -qa | grep kernel
  kernel-2.4.18-14
  kernel-pcmcia-cs-3.1.31-9
  openafs-kernel-1.2.8-rh8.0.1
  
  # rpm -qa | grep glibc
  glibc-2.3.2-4.80
  glibc-kernheaders-2.4-7.20
  glibc-common-2.3.2-4.80
  glibc-devel-2.3.2-4.80
  glibc-debug-2.3.2-4.80

The following shows that the system has one processor:

  # cat /proc/cpuinfo
  processor       : 0
  vendor_id       : GenuineIntel
  cpu family      : 6
  model           : 7
  model name      : Pentium III (Katmai)
  stepping        : 3
  cpu MHz         : 597.407
  cache size      : 512 KB
  fdiv_bug        : no
  hlt_bug         : no
  f00f_bug        : no
  coma_bug        : no
  fpu             : yes
  fpu_exception   : yes
  cpuid level     : 2
  wp              : yes
  flags           : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov 
pat pse36 mmx fxsr sse
  bogomips        : 1187.96


I also tested on a machine with two processors, and saw the same problem.



------- Additional Comment #6 From Eric M. Agar 2003-03-31 13:13 -------

The updates for glibc were made to the system based on the information at:

http://rhn.redhat.com/errata/RHSA-2003-089.html#Red%20Hat%20Linux%208.0


------- Additional Comment #7 From Eric M. Agar 2003-03-31 13:16 -------

The updates to glibc were applied because we need the fix for the defect I 
reported to the LTC in bug report 1818.


------- Additional Comment #8 From Salina Chu 2003-03-31 16:09 -------

I updated the RedHat 8.0 kernel to kernel-2.4.18-27.8.0 but did not update 
glibc.  Both pthread test programs ran fine.  

I then updated glibc to 2.3.2-4.80, both test programs are failing.
The problem seems to be in glibc, not kernel.

I applied the same errata fix on the page you mentioned
https://rhn.redhat.com/errata/RHSA-2003-089.html
to a RedHat 7.2 system, 
running both pthread test programs there works.
The glibc fix is at glibc-2.2.4-32.
This looks like RedHat errata fix for 8.0 is broken.

I don't know why for RedHat 8.0, glibc version went from 2.2 to 2.3 level for 
an errata fix either.    
Talked to Khoa about this, we are going to assign problem to Glen Johnson.
Glen, please report problem to RedHat and track as issue.
We should get problem resolved faster via RedHat since this is one of their 
errata fix that caused the problem.  GA level code works fine.

Thanks,
Salina


------- Additional Comment #9 From Salina Chu 2003-03-31 16:55 -------

Looks like this new glibc broke a few things - searching Redhat Bugzilla, can't 
even use gdb now.

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=87581

There is a new glibc mentioned in that bug report, you may want to give it a 
shot.

Salina
Comment 1 Saurabh Desai 2003-03-31 20:06:08 EST
Looks like the Asynchronous cancellation in glibc-2.3 broke this. Here,
the pthread_mutex_lock acts as a cancellation point and that's wrong.
Now the sigsuspend() enables asynchronous cancellation and when
a thread waits on pthread_mutex_lock() it calls sigsupend(). In this test 
program, a thread is already cancelled and now wait for mutex. Other thread
unlocks the mutex and sends a signal to restart first thread, the signal
handler then checks for async-cancellation and calls the cleanup routine,
without acquiring the mutex. The cleanup handler then tries to unlock that
mutex, which fails because the owner is NULL.
Here is a stack trace, the way the cleanup handler gets called:

#0  thread_rtn_cleanup (arg_p=0x0) at ltc_bug.c:26
#1  0x40023e08 in __pthread_perform_cleanup () from /lib/libpthread.so.0
#2  0x4002468f in __pthread_do_exit () from /lib/libpthread.so.0
#3  0x40028061 in pthread_handle_sigcancel () from /lib/libpthread.so.0
#4  <signal handler called>
#5  0x400995e8 in sigsuspend () from /lib/libc.so.6
#6  0x40027c58 in __pthread_wait_for_restart_signal () from /lib/libpthread.so.0
#7  0x40029f20 in __pthread_alt_lock () from /lib/libpthread.so.0
#8  0x4002613a in pthread_mutex_lock () from /lib/libpthread.so.0
#9  0x08048841 in thread_rtn (arg=0x0) at ltc_bug.c:51
#10 0x40025c70 in pthread_start_thread_event () from /lib/libpthread.so.0

Where this program works (under glibc-2.2.93), the stack trace looks like:

#0  thread_rtn_cleanup (arg_p=0x0) at ltc_bug.c:24
#1  0x4002aee8 in __pthread_perform_cleanup () from /lib/i686/libpthread.so.0
#2  0x4002b655 in __pthread_do_exit () from /lib/i686/libpthread.so.0
#3  0x4002ad85 in pthread_testcancel () from /lib/i686/libpthread.so.0
#4  0x08048869 in thread_rtn (arg=0x0) at ltc_bug.c:54
#5  0x4002c941 in pthread_start_thread () from /lib/i686/libpthread.so.0
#6  0x4002ca45 in pthread_start_thread_event () from /lib/i686/libpthread.so.0

The pthread_mutex_lock is not a cancellation point.
Comment 2 IBM Bug Proxy 2003-04-01 11:27:17 EST
*** Bug 87704 has been marked as a duplicate of this bug. ***
Comment 3 Jakub Jelinek 2003-04-02 04:57:10 EST
I've built glibc 2.3.2-4.80.4 in ftp://people.redhat.com/jakub/glibc/errata/8.0/
which should fix this. Does it work for you?
Comment 4 Salina Chu 2003-04-02 11:28:46 EST
Hi Jakub,
Thanks for rebuilding 2.3.2-4.80.4 rpm.  I have downloaded and installed the 
new ones.  They work with the 2 testcases.
Asking Eric Agar to use the new rpms to see if he runs into any other problems.
Salina
Comment 5 IBM Bug Proxy 2003-04-02 16:49:30 EST
------- Additional Comment #20 From Eric M. Agar  2003-04-02 15:58 -------

I downloaded the following rpms from 
ftp://people.redhat.com/jakub/glibc/errata/8.0

glibc-devel-2.3.2-4.80.4
glibc-2.3.2-4.80.4
glibc-debug-2.3.2-4.80.4
glibc-common-2.3.2-4.80.4

I verified the test programs I included in this defect report now work properly 
with these rpms.

More importantly, the product code that had experienced problems with this 
defect now appears to be working correctly.

Thanks again to everyone who responded to this defect report in such a timely 
fashion.

What is the process from here?  How/when do these fixes become officially 
available from Redhat or any other distribution that may have the same 
problem?  This code is from the Free Software Foundation ultimately, isn't it?

Comment 6 Jakub Jelinek 2003-04-09 15:21:35 EDT
An errata has been issued which should help the problem described in this bug report. 
This report is therefore being closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, please follow the link below. You may reopen 
this bug report if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2003-136.html
Comment 7 IBM Bug Proxy 2003-04-09 17:01:22 EDT
Will Red Hat only put out errata fix for RH 9.0 ?
Is there a plan to put out official fix for RH 8.0 
or are we expected  to use the errata fix for 9.0 on RH 8 ?? 

Comment 8 Jakub Jelinek 2003-04-09 17:04:30 EDT
glibc-2.3.2-4.80.6 is in QA hands ATM.
Comment 9 Jakub Jelinek 2003-04-10 19:09:24 EDT
An errata has been issued which should help the problem described in this bug report. 
This report is therefore being closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, please follow the link below. You may reopen 
this bug report if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2003-089.html
Comment 10 masanari iida 2005-04-26 04:46:27 EDT
I want to confirm with Jakub that this bug (87656) is fixed on glibc for RHAS 2.1.
I spent some time to find out this bug ID on errata report web page and also
changelog in RPM. But I couldn't find it.

Note You need to log in before you can comment on or make changes to this bug.