Bug 40533

Summary: Simple pthread program is stopped.
Product: [Retired] Red Hat Linux Reporter: Shinya Narahara <naraha_s>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 7.3CC: jakub
Target Milestone: ---   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-30 15:39:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
my test program
none
Pthread test program none

Description Shinya Narahara 2001-05-14 12:43:11 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.75 [ja] (WinNT; U)

Description of problem:
My test program for pthread library which is very simple, is
stopped on IA-64 Linux. same program is working on ix86(RH7.1).
glibc version is 2.2.2-10.

How reproducible:
Always

Steps to Reproduce:
1.Compile my test program.(gcc -Wall pthread.c -lpthread)
2.Run it.
3.


Actual Results:  On IA-64, it will be intercepted(it will end normally on 
ix86).
The timing intercepted is not same.


Additional info:

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

#define DEBUG
#define ARRAYSIZ       512
#define MAX_THREAD_NUM 16
#define REPEAT         10000

int  ary[ARRAYSIZ];

typedef struct {
    int  fork_id;
    int  thread_id;
} arg_t;



/* calc_func() is called as threads. */
/* Just calculate something. */
void *calc_func( void *arg )
{
  int i;
  int fork_id, thread_id, sum;

  fork_id = ((arg_t *)arg)->fork_id;
  thread_id = ((arg_t *)arg)->thread_id;
#ifdef DEBUG
  printf( "para_func start --- thread=%05d-%05d\n",fork_id,thread_id);
#endif

  for (i = 0, sum = 0; i < MAX_THREAD_NUM; i++)
    sum += ary[i];
#ifdef DEBUG
  printf( "sum=%d(%d-%d)\n", sum, fork_id, thread_id );
  printf( "para_func end ----- thread=%05d-%05d\n", fork_id,thread_id );
#endif
  return( NULL );
}



/* para_fork() is called from main(). */
/* Making threads and wait its exiting. */
void para_fork( int  fork_id, void *(*thread_func)(void *arg) )
{
  arg_t          arg[MAX_THREAD_NUM];
  pthread_t      thread[MAX_THREAD_NUM];
  int            i;

#ifdef DEBUG
  printf("--- fork start ---\n\n");
#endif
  for (i = 0; i < MAX_THREAD_NUM; i++) {
    arg[i].fork_id = fork_id;
    arg[i].thread_id = i;
  }

  /* Create threads */
  for (i = 0; i < MAX_THREAD_NUM; i++)
    if(pthread_create( &thread[i], NULL, thread_func, (void *)&arg[i] ) != 
0)
      printf( "error! : thread(%d) not created\n", i );

  /* Wait threads */
  for (i = 0; i < MAX_THREAD_NUM; i++) {
    pthread_join( thread[i], NULL );
    pthread_detach( thread[i] );
  }

#ifdef DEBUG
  printf("\n--- fork end ---\n");
#endif
}




int  main( void )
{
  int i;

  /* Initialize the array for calculation */
  for(i = 0; i < ARRAYSIZ; i++)
    ary[i] = i;

  /* Call para_fork() to create threads */
  for(i = 0; i < REPEAT; i++)
    para_fork( i, calc_func );
  return( 0 );
}

Comment 1 Shinya Narahara 2001-05-14 12:44:20 UTC
Created attachment 18273 [details]
my test program

Comment 2 Elliot Lee 2001-05-16 01:04:39 UTC
detaching a join'd thread makes zero sense.

Comment 3 Jakub Jelinek 2001-05-16 08:05:42 UTC
It is reproducible even without it.
In the debugging I did the thread descriptor (at top of thread stack) got
cleared somewhen between calling __clone2 and returning from it, apparently
the child has been run in between and the thread descriptor was ok even
at the place where the child did _exit, but when __clone2 returned, it was
all zeros.
Vanilla 2.4.4 does not exhibit this problem.

Comment 4 Bill Nottingham 2001-05-21 18:27:00 UTC
fixed in kernel-2.4.3-6.99.1.

Comment 5 Shinya Narahara 2001-06-25 13:16:37 UTC
In most case, the program above works good on the new kernel-2.4.3-12.
 But sometimes we can see the system time is very huge by runing another program
which uses the system call malloc() with allocating big memory. Calling it, glibc uses the
 "mmap" function in the kernel(As you know, mmap is very slow function.)
We tried solving this issue by using environment value MALLOC_MMAP_MAX and MALLOC_TRIM_THRESHOLD as below:
export MALLOC_MMAP_MAX=0   # Don't use mmap
export MALLOC_TRIM_THRESHOLD=4194304
And fixing HEAP_MAX_SIZE to (8*1024*1024) in malloc.c in glibc-2.2.2-10. 

The results are:
1) The program which creates 1 thread and using big memory malloc() works good.
2) The program which creates 2 threads and using big memory malloc() "almost" works good, sometimes very slow.
3) On the machine Pentium 3 2way(smp), the programs 1) and 2) work good.
4) On the machine Itanium 3way and 8way, the programs 1) and 2) "almost" work good, sometimes very slow.

So, a machine which has over 2 cpus has the issue above yet.
Is this a pthread bug? Or just belong to kernel/shell/glibc setting?


Comment 6 Shinya Narahara 2001-06-25 13:20:21 UTC
You can get the test programs from:
ftp://ncbi.nlm.nih.gov/toolbox/ncbi_tools/ncbi.tar.gz
ftp://ncbi.nlm.nih.gov/toolbox/ncbi_tools/data.tar.gz


Comment 7 Glen Foster 2001-07-13 22:07:07 UTC
This defect considered MUST-FIX for Fairfax gold-release.

Comment 8 Bill Nottingham 2001-08-16 04:14:55 UTC
Does this persist on 2.4.7-2 or later?

Comment 9 Shinya Narahara 2001-08-24 03:00:19 UTC
Yes, unfortunately. It's fine on 2.4.7-2, but not fine on 2.4.7-2smp.
The easy test program is attached below.
We suppose this issue might depend on linux_pthread library...

Comment 10 Shinya Narahara 2001-08-24 03:03:21 UTC
Created attachment 29317 [details]
Pthread test program

Comment 11 Shinya Narahara 2001-08-24 04:34:33 UTC
Extract tgz, make and execute "pttst" to run test program.
This is the program to make 8 thread and context switcing
10000 times, looped for 50 times.

On smp kernel, this program is stopped since the kernel
stopped to serve the pthreads. 

We recognized the glibc(linuxthread) library, but we
couldn't get the reason. _pthread_alt_unlock()(in
linuxthreads/spinlock.c) called by pthread_mutex_unlock()
might have problem. _pthread_alt_unlock() is checking
sleeping threads with recognizing the thread queue, and if
needed, awake the thread in the queue. Actually while we
can see this issue, the queue is never checked hence
the threads in the queue are never awaked.
# Or, the pointer located top of the queue, is suddenly
# cleared with NULL in rare case.


Comment 12 Arjan van de Ven 2001-09-06 11:28:12 UTC
ok running the test for an hour now with our latest kernel, seems to work 
(eg it keeps running)

Comment 13 Shinya Narahara 2001-10-31 07:50:01 UTC
We tested this "pttst" again on RH7.1 for Itanium, with updated package kernel-2.4.9-6smp
and glibc-2.2.4-19. Unfortunately we have same issue yet on the system.
Where is your latest(and tested) kernel? Could you please specify the package version
of your test system especially kernel and glibc?

Again, test machine must have more than 2cpus, and smp kernel.


Comment 14 Shinya Narahara 2001-11-01 07:58:22 UTC
If the machine has 2cpus, it may be hard to reproduce this issue.
We've tested this on the machine 4cpus and 8cpus. On the machine which has
8 cpus, we can reproduce this more easily.


Comment 15 Bugzilla owner 2004-09-30 15:39:00 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/