From Bugzilla Helper: User-Agent: Mozilla/4.75 [ja] (WinNT; U) Description of problem: My test program for pthread library which is very simple, is stopped on IA-64 Linux. same program is working on ix86(RH7.1). glibc version is 2.2.2-10. How reproducible: Always Steps to Reproduce: 1.Compile my test program.(gcc -Wall pthread.c -lpthread) 2.Run it. 3. Actual Results: On IA-64, it will be intercepted(it will end normally on ix86). The timing intercepted is not same. Additional info: #include <stdio.h> #include <stdlib.h> #include <pthread.h> #define DEBUG #define ARRAYSIZ 512 #define MAX_THREAD_NUM 16 #define REPEAT 10000 int ary[ARRAYSIZ]; typedef struct { int fork_id; int thread_id; } arg_t; /* calc_func() is called as threads. */ /* Just calculate something. */ void *calc_func( void *arg ) { int i; int fork_id, thread_id, sum; fork_id = ((arg_t *)arg)->fork_id; thread_id = ((arg_t *)arg)->thread_id; #ifdef DEBUG printf( "para_func start --- thread=%05d-%05d\n",fork_id,thread_id); #endif for (i = 0, sum = 0; i < MAX_THREAD_NUM; i++) sum += ary[i]; #ifdef DEBUG printf( "sum=%d(%d-%d)\n", sum, fork_id, thread_id ); printf( "para_func end ----- thread=%05d-%05d\n", fork_id,thread_id ); #endif return( NULL ); } /* para_fork() is called from main(). */ /* Making threads and wait its exiting. */ void para_fork( int fork_id, void *(*thread_func)(void *arg) ) { arg_t arg[MAX_THREAD_NUM]; pthread_t thread[MAX_THREAD_NUM]; int i; #ifdef DEBUG printf("--- fork start ---\n\n"); #endif for (i = 0; i < MAX_THREAD_NUM; i++) { arg[i].fork_id = fork_id; arg[i].thread_id = i; } /* Create threads */ for (i = 0; i < MAX_THREAD_NUM; i++) if(pthread_create( &thread[i], NULL, thread_func, (void *)&arg[i] ) != 0) printf( "error! : thread(%d) not created\n", i ); /* Wait threads */ for (i = 0; i < MAX_THREAD_NUM; i++) { pthread_join( thread[i], NULL ); pthread_detach( thread[i] ); } #ifdef DEBUG printf("\n--- fork end ---\n"); #endif } int main( void ) { int i; /* Initialize the array for calculation */ for(i = 0; i < ARRAYSIZ; i++) ary[i] = i; /* Call para_fork() to create threads */ for(i = 0; i < REPEAT; i++) para_fork( i, calc_func ); return( 0 ); }
Created attachment 18273 [details] my test program
detaching a join'd thread makes zero sense.
It is reproducible even without it. In the debugging I did the thread descriptor (at top of thread stack) got cleared somewhen between calling __clone2 and returning from it, apparently the child has been run in between and the thread descriptor was ok even at the place where the child did _exit, but when __clone2 returned, it was all zeros. Vanilla 2.4.4 does not exhibit this problem.
fixed in kernel-2.4.3-6.99.1.
In most case, the program above works good on the new kernel-2.4.3-12. But sometimes we can see the system time is very huge by runing another program which uses the system call malloc() with allocating big memory. Calling it, glibc uses the "mmap" function in the kernel(As you know, mmap is very slow function.) We tried solving this issue by using environment value MALLOC_MMAP_MAX and MALLOC_TRIM_THRESHOLD as below: export MALLOC_MMAP_MAX=0 # Don't use mmap export MALLOC_TRIM_THRESHOLD=4194304 And fixing HEAP_MAX_SIZE to (8*1024*1024) in malloc.c in glibc-2.2.2-10. The results are: 1) The program which creates 1 thread and using big memory malloc() works good. 2) The program which creates 2 threads and using big memory malloc() "almost" works good, sometimes very slow. 3) On the machine Pentium 3 2way(smp), the programs 1) and 2) work good. 4) On the machine Itanium 3way and 8way, the programs 1) and 2) "almost" work good, sometimes very slow. So, a machine which has over 2 cpus has the issue above yet. Is this a pthread bug? Or just belong to kernel/shell/glibc setting?
You can get the test programs from: ftp://ncbi.nlm.nih.gov/toolbox/ncbi_tools/ncbi.tar.gz ftp://ncbi.nlm.nih.gov/toolbox/ncbi_tools/data.tar.gz
This defect considered MUST-FIX for Fairfax gold-release.
Does this persist on 2.4.7-2 or later?
Yes, unfortunately. It's fine on 2.4.7-2, but not fine on 2.4.7-2smp. The easy test program is attached below. We suppose this issue might depend on linux_pthread library...
Created attachment 29317 [details] Pthread test program
Extract tgz, make and execute "pttst" to run test program. This is the program to make 8 thread and context switcing 10000 times, looped for 50 times. On smp kernel, this program is stopped since the kernel stopped to serve the pthreads. We recognized the glibc(linuxthread) library, but we couldn't get the reason. _pthread_alt_unlock()(in linuxthreads/spinlock.c) called by pthread_mutex_unlock() might have problem. _pthread_alt_unlock() is checking sleeping threads with recognizing the thread queue, and if needed, awake the thread in the queue. Actually while we can see this issue, the queue is never checked hence the threads in the queue are never awaked. # Or, the pointer located top of the queue, is suddenly # cleared with NULL in rare case.
ok running the test for an hour now with our latest kernel, seems to work (eg it keeps running)
We tested this "pttst" again on RH7.1 for Itanium, with updated package kernel-2.4.9-6smp and glibc-2.2.4-19. Unfortunately we have same issue yet on the system. Where is your latest(and tested) kernel? Could you please specify the package version of your test system especially kernel and glibc? Again, test machine must have more than 2cpus, and smp kernel.
If the machine has 2cpus, it may be hard to reproduce this issue. We've tested this on the machine 4cpus and 8cpus. On the machine which has 8 cpus, we can reproduce this more easily.
Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/