I'm setting this to "blocker" since this should be a BIG issue at Red Hat. As you can see from the report the whole server can come unglued. But only in this context which I (the reporter) does not know if others code like this. If this is a rare case then "critical" would better describe the Severity PROBLEM SUMMARY =============== When a program has created some threads, and those threads create more threads, the pthreads library appears to become confused to the point that threads are created with the wrong arguments. ORIGINAL SYMPTOMS OF PROBLEM ============================ I first saw this problem in a program where the thread argument was a pointer to allocated memory. Each thread created was to be provided with the address of allocated memory that was for its exclusive use. Once the thread was done with the allocated memory, it would free the memory. But I saw that every now and then multiple threads were being given the same address, causing the same memory to be freed twice. In order to convince myself that the problem was not with how my program was managing its memory or serializing itself, I devised a test program to illustrate the problem. TEST PROGRAM TO ILLUSTRATE THE PROBLEM ====================================== The threads of the test program fall into 3 categories. First, there's the initial or primary thread. Next, there are the secondary threads. All the secondary threads are created by the primary thread, and then the primary thread waits for the secondary threads to end. The number of secondary threads created is specified by the SEC_THREADS constant (see the source code below). The secondary threads create the tertiary threads. The number of tertiary threads created by a secondary thread at once is specified by the SEC_THREADS_PER_REP constant. Each secondary thread creates tertiary threads, and then waits for those tertiary threads to end. This is done in a loop. The number of times the loop is executed is specified by the SEC_THREADS_REP constant. The total number of tertiary threads created during the life of the program is specified by the constant TER_THREADS. As each secondary thread is created, the thread id returned by pthread_create() is stored in an element of the static array cr_sec_tids. As each tertiary thread is created, the thread id returned by pthread_create() is stored in an element of the static array cr_ter_tids. When a secondary thread starts executing its routine, sec_thread_rtn(), it will record its thread ID, as determined by pthread_self(), in the element of the ex_sec_tids array specified by the thread's argument. But, first, it will make sure no other thread has put a thread ID in the element. When a tertiary thread starts executing its routine, ter_thread_rtn(), it will record its thread ID, as determined by pthread_self(), in the element of the ex_ter_tids array specified by the thread's argument. First, it will make sure no other thread has put a thread ID in the element. When everything is working properly, no thread should ever detect that another thread has recorded a thread ID in the element specified for it. The ex_sec_tids array has one element for every secondary thread to be created during the run of the program, and the ex_ter_tids array has one element for every tertiary thread to be created during the run of the program. However, on a SMP system with 2 CPUs, and the level of glibc indicated below, tertiary threads detect that some other thread has been given the address of the same array element. Here's some information about the system. The system has Redhat 7.1 installed. # rpm -qa | grep glibc glibc-devel-2.2.2-10 compat-glibc-6.2-2.1.3.2 glibc-common-2.2.2-10 glibc-2.2.2-10 glibc-profile-2.2.2-10 # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 5 model name : Pentium II (Deschutes) stepping : 0 cpu MHz : 333.059 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr bogomips : 665.19 processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 5 model name : Pentium II (Deschutes) stepping : 0 cpu MHz : 333.059 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr bogomips : 665.19 Here's the source code of the test program, threadarg.c: ======================= source code for threadarg.c =========================== #include <pthread.h> #include <unistd.h> #include <stdlib.h> #include <stdio.h> #include <string.h> #define SEC_THREADS 10 #define SEC_THREADS_REP 20000 #define TER_THREADS_PER_REP 2 #define TER_THREADS SEC_THREADS * SEC_THREADS_REP * TER_THREADS_PER_REP static pthread_t cr_sec_tids[SEC_THREADS]; static pthread_t ex_sec_tids[SEC_THREADS]; static pthread_t cr_ter_tids[TER_THREADS]; static pthread_t ex_ter_tids[TER_THREADS]; void * ter_thread_rtn(void *arg) { pthread_t me; pthread_t *ex_ter_tid_p; int ter_tid_ndx; me = pthread_self(); ex_ter_tid_p = (pthread_t *)arg; ter_tid_ndx = ex_ter_tid_p - ex_ter_tids; if (ter_tid_ndx < 0 || ter_tid_ndx >= TER_THREADS) { fprintf(stderr, "%u: Got invalid tertiary thread index: %d\n", me, ter_tid_ndx); return NULL; } if (*ex_ter_tid_p != 0) { fprintf(stderr, "%u: ex_ter_tid[%u] already set to %u!\n", me, ter_tid_ndx, *ex_ter_tid_p); return NULL; } *ex_ter_tid_p = me; return NULL; } void * sec_thread_rtn(void *arg) { pthread_t me; int i, r, rc; pthread_t *ex_sec_tid_p; int sec_tid_ndx, ter_tid_ndx0, ter_tid_ndx; me = pthread_self(); ex_sec_tid_p = (pthread_t *)arg; sec_tid_ndx = ex_sec_tid_p - ex_sec_tids; if (sec_tid_ndx < 0 || sec_tid_ndx >= SEC_THREADS) { fprintf(stderr, "%u: Got invalid secondary thread index: %d\n", me, sec_tid_ndx); return NULL; } if (*ex_sec_tid_p != 0) { fprintf(stderr, "%u: ex_sec_tids[%u] already set to %u!\n", me, sec_tid_ndx, *ex_sec_tid_p); return NULL; } *ex_sec_tid_p = me; ter_tid_ndx0 = (sec_tid_ndx * SEC_THREADS_REP * TER_THREADS_PER_REP); for (r = 0; r < SEC_THREADS_REP; r++) { ter_tid_ndx = ter_tid_ndx0 + (r * TER_THREADS_PER_REP); for (i = 0; i < TER_THREADS_PER_REP; i++, ter_tid_ndx++) { rc = pthread_create(&cr_ter_tids[ter_tid_ndx], NULL, ter_thread_rtn, &ex_ter_tids[ter_tid_ndx]); if (rc != 0) { fprintf(stderr, "Error creating tertiary thread with " "index %d: (%d) %s\n", ter_tid_ndx, rc, strerror(rc)); exit(1); } } ter_tid_ndx = ter_tid_ndx0 + (r * TER_THREADS_PER_REP); for (i = 0; i < TER_THREADS_PER_REP; i++, ter_tid_ndx++) { rc = pthread_join(cr_ter_tids[ter_tid_ndx], NULL); if (rc != 0) { fprintf(stderr, "Error joining tertiary thread with " "index %d, thread ID %u: (%d) %s\n", ter_tid_ndx, cr_ter_tids[ter_tid_ndx], rc, strerror(rc)); /* keep going */ } } } return NULL; } int main(int argc, char **argv) { int i, rc; memset(cr_sec_tids, '\0', sizeof cr_sec_tids); memset(ex_sec_tids, '\0', sizeof ex_sec_tids); memset(cr_ter_tids, '\0', sizeof cr_ter_tids); memset(ex_ter_tids, '\0', sizeof ex_ter_tids); for (i = 0; i < SEC_THREADS; i++) { rc = pthread_create(&cr_sec_tids[i], NULL, sec_thread_rtn, &ex_sec_tids[i]); if (rc != 0) { fprintf(stderr, "Error creating secondary thread with index %d: (%d) %s\n", i, rc, strerror(rc)); exit(1); } } for (i = 0; i < SEC_THREADS; i++) { rc = pthread_join(cr_sec_tids[i], NULL); if (rc != 0) { fprintf(stderr, "Error joining secondary thread with " "index %d, thread ID %u: (%d) %s\n", i, cr_sec_tids[i], rc, strerror(rc)); /* keep going */ } } for (i = 0; i < SEC_THREADS; i++) { if (ex_sec_tids[i] != cr_sec_tids[i]) { fprintf(stderr, "For index %d, secondary thread id mismatch: %u %u.\n", i, cr_sec_tids[i], ex_sec_tids[i]); /* keep going */ } } for (i = 0; i < TER_THREADS; i++) { if (ex_ter_tids[i] != cr_ter_tids[i]) { fprintf(stderr, "For index %d, tertiary thread id mismatch: %u %u.\n", i, cr_ter_tids[i], ex_ter_tids[i]); /* keep going */ } } return 0; } ======================= source code for threadarg.c =========================== RUNNING THE TEST PROGRAM ======================== Here's how I compiled threadarg.c: # cc threadarg.c -o threadarg -lpthread -lc Here's the output of a run of the test program: 2782236: ex_ter_tid[120271] already set to 2781211! 5587993: ex_ter_tid[120769] already set to 5586965! 9749534: ex_ter_tid[40861] already set to 9748509! 11813914: ex_ter_tid[41171] already set to 11812886! 13165601: ex_ter_tid[122531] already set to 13164576! Error joining tertiary thread with index 122630, thread ID 13540382: (22) Invalid argument Error joining tertiary thread with index 122680, thread ID 13733922: (22) Invalid argument 14037027: ex_ter_tid[122752] already set to 14036002! Error joining tertiary thread with index 122817, thread ID 14271535: (22) Invalid argument Error joining tertiary thread with index 122819, thread ID 14275636: (22) Invalid argument Error joining tertiary thread with index 122863, thread ID 14445620: (22) Invalid argument Error joining tertiary thread with index 41985, thread ID 16016508: (22) Invalid argument Error joining tertiary thread with index 123310, thread ID 16020616: (22) Invalid argument Error joining tertiary thread with index 123311, thread ID 16020616: (22) Invalid argument Error joining tertiary thread with index 123983, thread ID 18274487: (22) Invalid argument Error joining tertiary thread with index 42411, thread ID 18327752: (22) Invalid argument Error creating tertiary thread with index 201571: (11) Resource temporarily unavailable Some of the tertiary threads detect the element of the ex_ter_tids array they've been given already has some thread ID recorded in the element. This should never happen, since no two pthread_create() invocations specify the same element. This indicates to me the pthreads library is confused and creating some threads with identical arguments by mistake. Once this has happened there is another problem: some of the calls to pthread_join() fail -- perhaps because some data structure is corrupted in the pthreads library. (During other runs of the test program the pthread_join errno value was 3, instead of 22). Then eventually new threads cannot be created because of the unavailability of resources. That should not happen in this program. The program certainly creates a lot of threads -- over 400,000 if the program runs to completion, but only about 30 threads can be active at any one time. ENVIRONMENTS WHERE THE TEST PROGRAM SHOWED NO PROBLEMS ====================================================== I ran the test program on a uniprocessor system running Redhat 6.2. The test program ran successfully to completion. I don't know if this was because the system only has one processor or because it has a different version of the pthreads library installed. Here's information about that system. # rpm -q -a | grep libc glibc-2.1.3-15 compat-glibc-5.2-2.0.7.2 glibc-devel-2.1.3-15 glibc-profile-2.1.3-15 libc-5.3.12-31 # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 5 model name : Pentium II (Deschutes) stepping : 1 cpu MHz : 299.613059 cache size : 512 KB fdiv_bug : no hlt_bug : no sep_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr bogomips : 299.01 I also ran the test program on a non-Intel SMP running another flavor of Unix. This system has a different implementation of pthreads. The test program ran successfully on that system. http://www.redhat.com/bugzilla/show_bug.cgi?id=43741 -
Jakub sent me this test program for the thread library some time back. I attach a slightly modified version below shortly. It shows some extremely strange effects. I (and somebody else very much familiar with the thread stuff) spent time to look at the program and couldn't find anything wrong with it. Now the strange thing: the thread library itself seems not to be involved in the problems. Running the same program on a quad Alpha also doesn't show any problem. I do think meanwhile that this is a problem with the memory handling for x86 in the kernel. Let me start by showing some output of the program: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 155288599: ex_ter_tid[55152].th already set to 155287561! 55152: cr_ter_tids(155287561) != ex_ter_tids(155288599) 55153: cr_ter_tids(155288599) != ex_ter_tids(0) 55154: cr_ter_tids(155296771) != ex_ter_tids(155299849) 55155: cr_ter_tids(155299849) != ex_ter_tids(0) 55156: cr_ter_tids(155308057) != ex_ter_tids(155311107) 55157: cr_ter_tids(155311107) != ex_ter_tids(0) 55158: cr_ter_tids(155319324) != ex_ter_tids(155322371) 55159: cr_ter_tids(155322371) != ex_ter_tids(0) 55160: cr_ter_tids(155328529) != ex_ter_tids(155333635) 55161: cr_ter_tids(155333635) != ex_ter_tids(0) [...] 155432990: ex_ter_tid[255197].th already set to 155431965! 55178: cr_ter_tids(155429911) != ex_ter_tids(155439113) 55179: cr_ter_tids(155439113) != ex_ter_tids(0) 255198: cr_ter_tids(155432990) != ex_ter_tids(0) 55180: cr_ter_tids(155443224) != ex_ter_tids(155451401) 55181: cr_ter_tids(155451401) != ex_ter_tids(0) Error joining tertiary thread with index 255200, thread ID 155446299: (3) No such process [and so on...] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Let's start with the first line. As far as I can say something like this is always the first line. Maybe this is the indication of the original problem and the rest is just fallout. The message 155288599: ex_ter_tid[55152].th already set to 155287561! means that although the entry 55152 the array is used for the first time it already contains a value (the arrays are cleared at program start). You can look through the code, there is no other assignment. Unless the compiler generates somewhere different code this can only happen if an assignment somewhere went wrong. Please note that the assignments to ex_ter_tid all happen in the tertiary threads, the ones which are created and terminated with very high rate. My current guess is that the kernel at some point fails to set up the correct mapping and the assignment ex_ter_tid_p = (arr_t *)arg; in ter_thread_rtn() goes somewhere else. This first failure has a ripple effect. The lines 55152: cr_ter_tids(155287561) != ex_ter_tids(155288599) 55153: cr_ter_tids(155288599) != ex_ter_tids(0) mean that ex_ter_tids[55152] is wrong. Instead of the correct thread ID (155287561) the ID 155288599. If you look at the first message this is not surprising since the thread with the ID 155288599 assumes it has to write at the index 55152. The index is derived from the value of a parameter which is passed via clone() to the new thread. This almost always works, just not in this case. This pattern continues though most of the time the assignment mismatch is not detected. That's not a big surprise since it is a race. The symptoms are almost the same: the second thread of the group (tertiary threads are always launched in a group of two) writes the thread ID in the wrong array slot, leaving its own entry zero and overwriting the other threads ID. This is why you see these pairs. Occasionally even the joining fails. pthread_join() only uses the cr_ter_tids array which is normally not corrupted. *** I really cannot explain this behavior. The thread library seems fine. First we thought the pthread_join() would return too early but this can't be it since the ex_ter_tids() array gets written to in the wrong place. The problem seems not to have anything to do with the LDT usage. I've updated my machine to 2.4.8pre1 which should have the latest patch for LDT handling included. I think I've also tested the code with a i386 version of libpthread which is not using the LDT altogether. On UP machines or SMP machines with most all but one processor already busy the problem shows up not that often. I even managed to get 6 out of 10 runs without any problem. As mentioned above, I've tested on my quad Alpha without any problems. I don't have the resources available on the SMP IA-64 machines we have here so I couldn't test it and I don't know how to use the resources in Durham. I would appreciate if you can run the program on SMP machines of other architectures. So far it seems only x86 is affected which probably points at the kernel. *** Some words about the program. I've modified it to add another test and output (in sec_thread_rtn after the pthread_join call). And I've introduced the arr_t type which allows to add padding between the thread descriptors.
Created attachment 25811 [details] Modified test case to see the problem easier
My current theory: Without looking in the kernel sources I would guess that a succesful clone() call just creates the basic data structures to create the new thread but does not allocate all the resources. How the program and the thread library works is that there is never more than one clone() call going on at the same time. This is completely serialized. But there are many clone() calls. The initial threads creates 10 threads which in turn each create 2 threads over and over again. I.e., if the newly created threads are not scheduled immediately there can be many threads hanging around in limbo. Now, how does the kernel create the initial contexts for the thread, especially, how does it set the parameter? This is what seems to be going wrong. Two threads seem to get the same parameter even though the clone() syscall specifies different ones. And judging from the environment, this happens if the two threads are scheduled at the same time on different CPUs. Maybe this is giving somebody a clue.
Setting architecture to i386 and category to glibc (for now). This should probably be kernel.
This should be fixed with http://sources.redhat.com/ml/libc-alpha/2001-09/msg00114.html, at least the testcase no longer fails to me.
Closing this bug as our testing shows this is no longer an issue in version 7.2. (See LTC128)