Using kernel-2.6.9-34.EL, running pstack on the attached test case takes 10 minutes. Using the upstream 2.6.16, running pstack takes about 48 seconds.
Created attachment 128084 [details] test.c
I am unable to reproduce the bug. The attached test case (id=128084) has sleep(3000) which is executed from all the threads and the pthread_join() call waits untill all the child thread finishes executing sleep(3000). So I don't see how this can complete in just 48 seconds on upstream kernel.My testresults did behave the exact same way on both upsteam(2.6.17-rc3) and RHEL4 kernel. Can you retest you test case once again and let me know the results.
This problem is that it takes much time for pstack command to get stack information. Here's steps to reproduce; 1. run test case on ia64: # gcc -o test test.c -lpthread # ./test 2. run pstack for the test case: # ps ax | grep test 29272 pts/1 Sl 0:00 ./test # pstack 29272 Actual result: it takes 10 minutes or more. Expected result: it takes much shorter time. same performance with 2.6.16 is expected at least. (48sec or so) Additional Info: on x86 environment, pstack takes only 3 seconds. Below is a part of logs: [root@dhcp109 92010]# uname -a Linux dhcp109.tokyo.redhat.com 2.6.9-34.EL #1 SMP Fri Feb 24 16:49:08 EST 2006 ia64 ia64 ia64 GNU/Linux [root@dhcp109 92010]# ./test &>/dev/null & [1] 24359 [root@dhcp109 92010]# time pstack 24359 Thread 64 (Thread 2305843009227338368 (LWP 24360)): #0 0xa000000000010641 in __kernel_syscall_via_break () #1 0x2000000000196ce0 in __GC___libc_nanosleep () from /lib/tls/libc.so.6.1 #2 0x2000000000196970 in sleep () from /lib/tls/libc.so.6.1 #3 0x4000000000000a10 in counter () #4 0x200000000004d7f0 in start_thread () from /lib/tls/libpthread.so.0 #5 0x20000000002139f0 in __clone2 () from /lib/tls/libc.so.6.1 ... Thread 1 (Thread 2305843009216724992 (LWP 24359)): #0 0xa000000000010641 in __kernel_syscall_via_break () #1 0x200000000004fd50 in pthread_join () from /lib/tls/libpthread.so.0 #2 0x4000000000000cc0 in main () real 10m48.560s user 0m8.170s sys 10m40.252s
Created attachment 132202 [details] speedup ptrace by avoiding kernel stack walk Without the patch for 64 threads on Montecito pstack takes real 8m26.066s user 0m7.681s sys 8m18.154s Now with the above patch for the same 64 therad on Montecito, pstack takes (which is equivalent to 2.6.17 kernel time) real 1m19.469s user 0m7.805s sys 1m11.673s
Could we get devel_ack here ? Anil created a fix patch and Fujitsu has already verified it.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
committed in stream U5 build 42.14. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0304.html