Bug 40533
Summary: | Simple pthread program is stopped. | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Shinya Narahara <naraha_s> | ||||||
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 7.3 | CC: | jakub | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | ia64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2004-09-30 15:39:00 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Shinya Narahara
2001-05-14 12:43:11 UTC
Created attachment 18273 [details]
my test program
detaching a join'd thread makes zero sense. It is reproducible even without it. In the debugging I did the thread descriptor (at top of thread stack) got cleared somewhen between calling __clone2 and returning from it, apparently the child has been run in between and the thread descriptor was ok even at the place where the child did _exit, but when __clone2 returned, it was all zeros. Vanilla 2.4.4 does not exhibit this problem. fixed in kernel-2.4.3-6.99.1. In most case, the program above works good on the new kernel-2.4.3-12. But sometimes we can see the system time is very huge by runing another program which uses the system call malloc() with allocating big memory. Calling it, glibc uses the "mmap" function in the kernel(As you know, mmap is very slow function.) We tried solving this issue by using environment value MALLOC_MMAP_MAX and MALLOC_TRIM_THRESHOLD as below: export MALLOC_MMAP_MAX=0 # Don't use mmap export MALLOC_TRIM_THRESHOLD=4194304 And fixing HEAP_MAX_SIZE to (8*1024*1024) in malloc.c in glibc-2.2.2-10. The results are: 1) The program which creates 1 thread and using big memory malloc() works good. 2) The program which creates 2 threads and using big memory malloc() "almost" works good, sometimes very slow. 3) On the machine Pentium 3 2way(smp), the programs 1) and 2) work good. 4) On the machine Itanium 3way and 8way, the programs 1) and 2) "almost" work good, sometimes very slow. So, a machine which has over 2 cpus has the issue above yet. Is this a pthread bug? Or just belong to kernel/shell/glibc setting? You can get the test programs from: ftp://ncbi.nlm.nih.gov/toolbox/ncbi_tools/ncbi.tar.gz ftp://ncbi.nlm.nih.gov/toolbox/ncbi_tools/data.tar.gz This defect considered MUST-FIX for Fairfax gold-release. Does this persist on 2.4.7-2 or later? Yes, unfortunately. It's fine on 2.4.7-2, but not fine on 2.4.7-2smp. The easy test program is attached below. We suppose this issue might depend on linux_pthread library... Created attachment 29317 [details]
Pthread test program
Extract tgz, make and execute "pttst" to run test program. This is the program to make 8 thread and context switcing 10000 times, looped for 50 times. On smp kernel, this program is stopped since the kernel stopped to serve the pthreads. We recognized the glibc(linuxthread) library, but we couldn't get the reason. _pthread_alt_unlock()(in linuxthreads/spinlock.c) called by pthread_mutex_unlock() might have problem. _pthread_alt_unlock() is checking sleeping threads with recognizing the thread queue, and if needed, awake the thread in the queue. Actually while we can see this issue, the queue is never checked hence the threads in the queue are never awaked. # Or, the pointer located top of the queue, is suddenly # cleared with NULL in rare case. ok running the test for an hour now with our latest kernel, seems to work (eg it keeps running) We tested this "pttst" again on RH7.1 for Itanium, with updated package kernel-2.4.9-6smp and glibc-2.2.4-19. Unfortunately we have same issue yet on the system. Where is your latest(and tested) kernel? Could you please specify the package version of your test system especially kernel and glibc? Again, test machine must have more than 2cpus, and smp kernel. If the machine has 2cpus, it may be hard to reproduce this issue. We've tested this on the machine 4cpus and 8cpus. On the machine which has 8 cpus, we can reproduce this more easily. Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/ |