Bug 87704 - LTC2324-Thread improperly loses lock on mutex when thread is cancelled.
Summary: LTC2324-Thread improperly loses lock on mutex when thread is cancelled.
Keywords:
Status: CLOSED DUPLICATE of bug 87656
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: glibc
Version: 8.0
Hardware: i686
OS: Linux
high
high
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-04-01 15:53 UTC by IBM Bug Proxy
Modified: 2016-11-24 14:52 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-02-21 18:52:25 UTC
Embargoed:


Attachments (Terms of Use)

Description IBM Bug Proxy 2003-04-01 15:53:19 UTC
A thread's lock of a mutex appears to be released when thread is cancelled,
violating definition of how pthreads should work.

If a thread could be cancelled while holding a lock on a mutex, it is the
application's responsibility to unlock the mutex in a thread cancelation
cleanup handler.  I have product code that does this.  That code no longer
works correctly on Redhat 8.0, with glibc-2.3.2-4.80.

Hardware Environment:

xSeries, but I don't think that's important.

Software Environment:

Redhat 8.0, with glibc 2.3.2-4.80, as shown below:

  # rpm -qa | grep glibc
  glibc-2.3.2-4.80
  glibc-kernheaders-2.4-7.20
  glibc-common-2.3.2-4.80
  glibc-devel-2.3.2-4.80
  glibc-debug-2.3.2-4.80


Steps to Reproduce:

1. Compile the following test program, named tc1.c: "make CFLAGS=-lpthread tc1"

----- test program beginning ----

#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <assert.h>


#define NUM_THREADS     5
#define LIFETIME_SECS   1


static pthread_mutexattr_t  the_mutex_attr;
static pthread_mutex_t      the_mutex;


void thread_rtn_cleanup(void *arg_p)
{
    pthread_t me;
    int       rc;

    me = pthread_self();

    rc = pthread_mutex_unlock(&the_mutex);

    if (rc != 0) {
        fprintf(stderr, "%d: thread_rtn_cleanup: pthread_mutex_unlock() "
                "returned %d.\n", me, rc);
    }

    return;
}


void *thread_rtn(void *arg)
{
    pthread_t me;
    int       rc;


    me = pthread_self();

    fprintf(stderr, "Thread %d created.\n", me);

    pthread_cleanup_push(thread_rtn_cleanup, NULL);

    while (1) {
        rc = pthread_mutex_lock(&the_mutex);
        assert(rc == 0);

        pthread_testcancel();   /* force the issue of thread cancellation   */

        rc = pthread_mutex_unlock(&the_mutex);

        if (rc != 0) {
            fprintf(stderr, "%d: thread_rtn: pthread_mutex_unlock() "
                    "returned %d.\n", me, rc);
        }
    }

    pthread_cleanup_pop(0);

    return NULL;
}


int main(int argc, char **argv)
{
    pthread_t thread_ids[NUM_THREADS];
    int t, rc;


    rc = pthread_mutexattr_init(&the_mutex_attr);
    if (rc != 0) {
        fprintf(stderr, "Error initializing mutex attribute: (%d) %s\n",
                rc, strerror(rc));
        exit(1);
    }

    rc = pthread_mutexattr_settype(&the_mutex_attr,
                                   PTHREAD_MUTEX_ERRORCHECK_NP);
    if (rc != 0) {
        fprintf(stderr, "Error setting type in mutex attribute: (%d) %s\n",
                rc, strerror(rc));
        exit(1);
    }

    rc = pthread_mutex_init(&the_mutex, &the_mutex_attr);
    if (rc != 0) {
        fprintf(stderr, "Error initializing mutex: (%d) %s\n",
                rc, strerror(rc));
        exit(1);
    }

    for (t = 0; t < NUM_THREADS; t++) {
        rc = pthread_create(&thread_ids[t], NULL, thread_rtn, NULL);
        if (rc != 0) {
            fprintf(stderr, "Error creating thread number %d: (%d) %s\n",
                    t + 1, rc, strerror(rc));
            exit(1);
        }
    }

    sleep(LIFETIME_SECS);

    for (t = 0; t < NUM_THREADS; t++) {
        rc = pthread_cancel(thread_ids[t]);
        if (rc != 0) {
            fprintf(stderr, "Error cancelling thread number %d: (%d) %s\n",
                    t + 1, rc, strerror(rc));
            exit(1);
        }
    }

    for (t = 0; t < NUM_THREADS; t++) {
        rc = pthread_join(thread_ids[t], NULL);
        if (rc != 0) {
            fprintf(stderr, "Error joining thread number %d: (%d) %s\n",
                    t + 1, rc, strerror(rc));
            exit(1);
        }
    }

    return 0;
}

----- test program end ----

2. Consider what the program does.  It is the skeleton of some program that
   uses a mutex to protect critical sections of the code.  The main thread
   creates 5 threads, lets them run for a while, and then cancels and joins
   them.  The created threads serialize with each other using the mutex.  The
   threads reach a cancelation point while holding a lock on the mutex, due to
   the call to pthread_testcancel().  The threads push a cancelation cleanup
   handler.  The cancelation cleanup handler must unlock the mutex.

   Notice that the mutex was created with the PTHREAD_MUTEX_ERRORCHECK_NP
   attribute.  This causes the pthreads library to look for errors, such as a
   thread attempting to unlock a mutex on which it does not have a lock.

3. Run the program.  When the program is run, the pthread_mutex_unlock() call
   in the thread cancelation cleanup handler, thread_rtn_cleanup(), returns
   error code 1, EPERM.  Coming from pthread_mutex_unlock(), this code means
   the thread did not have a lock on the mutex.

4. Realize that the results in 3 is a problem.  The pthreads library is not
   supposed to automatically cause the mutex to be unlocked.

5. Note that this program runs correctly (no errors are reported) on Redhat
   7.x.

I consider this problem severe.  The problem can lead to critical code
sections no longer being protected.


------- Additional Comment #1 From Eric M. Agar 2003-03-28 16:00 -------

Here's the output I see when I run the test program:

[root@rsctcl70 agar2]# ./tc1
Thread 16386 created.
Thread 32771 created.
Thread 49156 created.
Thread 65541 created.
Thread 81926 created.
32771: thread_rtn_cleanup: pthread_mutex_unlock() returned 1.
81926: thread_rtn_cleanup: pthread_mutex_unlock() returned 1.
65541: thread_rtn_cleanup: pthread_mutex_unlock() returned 1.
16386: thread_rtn_cleanup: pthread_mutex_unlock() returned 1.
49156: thread_rtn_cleanup: pthread_mutex_unlock() returned 1.


------- Additional Comment #2 From Eric M. Agar 2003-03-28 16:04 -------

The following program also shows the problem, without relying on the pthreads 
library to report the problem.

#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>


#define NUM_GROUPS      10
#define NUM_THREADS     10
#define LIFETIME_SECS   1
#define DELAY_USECS     1


static pthread_mutex_t  the_mutex = PTHREAD_MUTEX_INITIALIZER;
static pthread_t        the_tid = 0;


void thread_rtn_cleanup(void *arg_p)
{
    pthread_t       me, actual, expect;


    me = pthread_self();

    expect = me;
    actual = the_tid;

    the_tid = 0;

    pthread_mutex_unlock(&the_mutex);

    if (actual != expect) {
        fprintf(stderr, "thread_rtn_cleanup: actual (%d) != expected (%d).\n",
                actual, expect);
    }

    return;
}


void *thread_rtn(void *arg)
{
    pthread_t       me, actual, expect;


    me = pthread_self();

    fprintf(stderr, "Thread %d created.\n", me);

    pthread_cleanup_push(thread_rtn_cleanup, NULL);

    while (1) {
        pthread_mutex_lock(&the_mutex);

        expect = 0;
        actual = the_tid; 

        the_tid = me;

        usleep(DELAY_USECS);    /* supposed to be thread cancellation point */
        pthread_testcancel();   /* force the issue of thread cancellation   */

        expect = me;
        actual = the_tid;

        the_tid = 0;

        pthread_mutex_unlock(&the_mutex);

        if (actual != expect) {
            fprintf(stderr, "thread_rtn: actual (%d) != expected (%d).\n",
                    actual, expect);
        }
    }

    pthread_cleanup_pop(0);

    return NULL;
}


int main(int argc, char **argv)
{
    int flag;
    int g, t;
    pthread_t thread_ids[NUM_THREADS];
    int rc;

    for (g = 0; g < NUM_GROUPS; g++) {

        for (t = 0; t < NUM_THREADS; t++) {
            rc = pthread_create(&thread_ids[t], NULL, thread_rtn, NULL);
            if (rc != 0) {
                fprintf(stderr, "Error creating thread number %d: (%d) %s\n",
                        t + 1, rc, strerror(rc));
                exit(1);
            }
        }

        sleep(LIFETIME_SECS);

        for (t = 0; t < NUM_THREADS; t++) {
            rc = pthread_cancel(thread_ids[t]);
            if (rc != 0) {
                fprintf(stderr, "Error cancelling thread number %d: (%d) %s\n",
                        t + 1, rc, strerror(rc));
                exit(1);
            }
        }

        for (t = 0; t < NUM_THREADS; t++) {
            rc = pthread_join(thread_ids[t], NULL);
            if (rc != 0) {
                fprintf(stderr, "Error joining thread number %d: (%d) %s\n",
                        t + 1, rc, strerror(rc));
                exit(1);
            }
        }

        fprintf(stderr, "Completed group %d.\n", g + 1);
    }

    return 0;
}


The type of output I see when I run it is:

Thread 1490946 created.
Thread 1507331 created.
Thread 1523716 created.
Thread 1540101 created.
Thread 1556486 created.
Thread 1572871 created.
Thread 1589256 created.
Thread 1605641 created.
Thread 1622026 created.
Thread 1638411 created.
thread_rtn_cleanup: actual (1589256) != expected (1490946).
thread_rtn_cleanup: actual (0) != expected (1507331).
thread_rtn_cleanup: actual (0) != expected (1523716).
thread_rtn_cleanup: actual (0) != expected (1540101).
thread_rtn_cleanup: actual (0) != expected (1556486).
thread_rtn_cleanup: actual (0) != expected (1572871).
thread_rtn_cleanup: actual (0) != expected (1589256).
thread_rtn_cleanup: actual (0) != expected (1605641).
thread_rtn_cleanup: actual (0) != expected (1622026).
thread_rtn_cleanup: actual (0) != expected (1638411).
Completed group 10.


------- Additional Comment #3 From Khoa D. Huynh 2003-03-28 17:18 -------

Salina - please look at this problem for me....Thanks.


------- Additional Comment #4 From Salina Chu 2003-03-31 11:59 -------

Hi,

Trying to re-create your problem ...

After installing RedHat 8.0,
   uname -a shows
Linux ltcvpwld.ltc.austin.ibm.com 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 
i686 i386 GNU/Linux
  
   rpm -qa | grep glibc shows

glibc-2.2.93-5
glibc-common-2.2.93-5
glibc-devel-2.2.93-5
glibc-kernheaders-2.4-7.20

[root@ltcvpwld /]# rpm -qa | grep kernel
kernel-2.4.18-14
kernel-pcmcia-cs-3.1.31-9
kernel-source-2.4.18-14


I compiled and executed both your test programs and did not see any problems.
I have a UP netvista machine.

Running 1st tcl program
Thread 8194 created.
Thread 16387 created.
Thread 24580 created.
Thread 32773 created.
Thread 40966 created.

Running 2nd tcl program
Thread 8194 created.
Thread 16387 created.
Thread 24580 created.
Thread 32773 created.
Thread 40966 created.
Thread 49159 created.
Thread 57352 created.
Thread 65545 created.
Thread 73738 created.
Thread 81931 created.
Completed group 1.
Thread 90114 created.
Thread 98307 created.
Thread 106500 created.
Thread 114693 created.
Thread 122886 created.
Thread 131079 created.
Thread 139272 created.
Thread 147465 created.
Thread 155658 created.
Thread 163851 created.
Completed group 2.
Thread 172034 created.
Thread 180227 created.
Thread 188420 created.
Thread 196613 created.
Thread 204806 created.
Thread 212999 created.
Thread 221192 created.
Thread 229385 created.
Thread 237578 created.
Thread 245771 created.
Completed group 3.
Thread 253954 created.
Thread 262147 created.
Thread 270340 created.
Thread 278533 created.
Thread 286726 created.
Thread 294919 created.
Thread 303112 created.
Thread 311305 created.
Thread 319498 created.
Thread 327691 created.
Completed group 4.
Thread 335874 created.
Thread 344067 created.
Thread 352260 created.
Thread 360453 created.
Thread 368646 created.
Thread 376839 created.
Thread 385032 created.
Thread 393225 created.
Thread 401418 created.
Thread 409611 created.
Completed group 5.
Thread 417794 created.
Thread 425987 created.
Thread 434180 created.
Thread 442373 created.
Thread 450566 created.
Thread 458759 created.
Thread 466952 created.
Thread 475145 created.
Thread 483338 created.
Thread 491531 created.
Completed group 6.
Thread 499714 created.
Thread 507907 created.
Thread 516100 created.
Thread 524293 created.
Thread 532486 created.
Thread 540679 created.
Thread 548872 created.
Thread 557065 created.
Thread 565258 created.
Thread 573451 created.
Completed group 7.
Thread 581634 created.
Thread 589827 created.
Thread 598020 created.
Thread 606213 created.
Thread 614406 created.
Thread 622599 created.
Thread 630792 created.
Thread 638985 created.
Thread 647178 created.
Thread 655371 created.
Completed group 8.
Thread 663554 created.
Thread 671747 created.
Thread 679940 created.
Thread 688133 created.
Thread 696326 created.
Thread 704519 created.
Thread 712712 created.
Thread 720905 created.
Thread 729098 created.
Thread 737291 created.
Completed group 9.
Thread 745474 created.
Thread 753667 created.
Thread 761860 created.
Thread 770053 created.
Thread 778246 created.
Thread 786439 created.
Thread 794632 created.
Thread 802825 created.
Thread 811018 created.
Thread 819211 created.
Completed group 10.
  
On the RedHat updates web site, for RedHat 8.0,
there are glibc-2.3.2-4.80 rpms and kernel 2.4.18-27.8.0 rpms.
Are the updates you are having problems with ?
You did not tell us which kernel version you are using, are you using the 
default RedHat 8.0 kernel or the updated kernel ?
Please show the uname -a and rpm -qa | grep kernel output so we know which 
version you are using.
Also is your machine UP or MP ?

Thanks

Salina


    


------- Additional Comment #5 From Eric M. Agar 2003-03-31 13:06 -------

Here's the info. you requested.  The machine has the glibc updates, but not
the kernel updates.

  # uname -a
  Linux rsctcl70.pok.ibm.com 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 
i386 GNU/Linux
  
  # rpm -qa | grep kernel
  kernel-2.4.18-14
  kernel-pcmcia-cs-3.1.31-9
  openafs-kernel-1.2.8-rh8.0.1
  
  # rpm -qa | grep glibc
  glibc-2.3.2-4.80
  glibc-kernheaders-2.4-7.20
  glibc-common-2.3.2-4.80
  glibc-devel-2.3.2-4.80
  glibc-debug-2.3.2-4.80

The following shows that the system has one processor:

  # cat /proc/cpuinfo
  processor       : 0
  vendor_id       : GenuineIntel
  cpu family      : 6
  model           : 7
  model name      : Pentium III (Katmai)
  stepping        : 3
  cpu MHz         : 597.407
  cache size      : 512 KB
  fdiv_bug        : no
  hlt_bug         : no
  f00f_bug        : no
  coma_bug        : no
  fpu             : yes
  fpu_exception   : yes
  cpuid level     : 2
  wp              : yes
  flags           : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov 
pat pse36 mmx fxsr sse
  bogomips        : 1187.96


I also tested on a machine with two processors, and saw the same problem.



------- Additional Comment #6 From Eric M. Agar 2003-03-31 13:13 -------

The updates for glibc were made to the system based on the information at:

http://rhn.redhat.com/errata/RHSA-2003-089.html#Red%20Hat%20Linux%208.0


------- Additional Comment #7 From Eric M. Agar 2003-03-31 13:16 -------

The updates to glibc were applied because we need the fix for the defect I 
reported to the LTC in bug report 1818.


------- Additional Comment #8 From Salina Chu 2003-03-31 16:09 -------

I updated the RedHat 8.0 kernel to kernel-2.4.18-27.8.0 but did not update 
glibc.  Both pthread test programs ran fine.  

I then updated glibc to 2.3.2-4.80, both test programs are failing.
The problem seems to be in glibc, not kernel.

I applied the same errata fix on the page you mentioned
https://rhn.redhat.com/errata/RHSA-2003-089.html
to a RedHat 7.2 system, 
running both pthread test programs there works.
The glibc fix is at glibc-2.2.4-32.
This looks like RedHat errata fix for 8.0 is broken.

I don't know why for RedHat 8.0, glibc version went from 2.2 to 2.3 level for 
an errata fix either.    
Talked to Khoa about this, we are going to assign problem to Glen Johnson.
Glen, please report problem to RedHat and track as issue.
We should get problem resolved faster via RedHat since this is one of their 
errata fix that caused the problem.  GA level code works fine.

Thanks,
Salina


------- Additional Comment #9 From Salina Chu 2003-03-31 16:55 -------

Looks like this new glibc broke a few things - searching Redhat Bugzilla, can't 
even use gdb now.

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=87581

There is a new glibc mentioned in that bug report, you may want to give it a 
shot.

Salina

Comment 1 IBM Bug Proxy 2003-04-01 16:27:14 UTC
Sorry, please ignore - duplicates 87656

*** This bug has been marked as a duplicate of 87656 ***

Comment 2 Red Hat Bugzilla 2006-02-21 18:52:25 UTC
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.


Note You need to log in before you can comment on or make changes to this bug.