The following has be reported by IBM LTC: pstack unable to see threads when using LinuxThreads Hardware Environment: Software Environment: Steps to Reproduce: 1.export LD_ASSUME_KERNEL=2.4.19 2.start a multi-threaded program 3.run pstack against the program's pid Actual Results: only see stack for one thread of app Expected Results: should see stacks for all threads of the app Additional Information: gdb works fine in LinuxThreads mode, but pstack does not. No errors are reported, just only see one stack.More Information: Description of pStack: pstack is needed for collecting debug information (call stacks) when the Domino server fails at a customer site. This information is used to diagnose the root cause of the failure in order to provide a fix for the customer problem. Platforms: We need a binary version of pStack for all platforms supported by Domino. (currently Intel, zSeries) Installation: We would like pStack to be installed by default. We use pstack for first failure data collection. The Domino product is a collection of multi-threaded programs all running together sharing resources via shared memory. When we have a failure, it can be such that one program will fail but another program has caused the failure. In order to aid us in debugging these crashes, we have an automated test tool "nsd" which uses pstack on Linux to obtain the stacks of all threads in all programs. No other tool gives us this ability. Without this feature, remote debugging - ie., customer sites, would be next to impossible except in the most simplest of failures, which we usually catch in-house. It is likely that other groups will be able to make use of pstack in the same way we do once it is operational. Glen/Greg - this is a pstack problem. Please submit this to Red Hat. Thanks.
pstack needs a few updates to match linuxthreads changes. I will work on these.
*** Bug 106656 has been marked as a duplicate of this bug. ***
The fixes to support newer linuxthreads have been included in Update-1. The only platform that is addressed is x86 architecture.
------ Additional Comments From kenbo.com 2003-21-11 12:30 ------- Got the link and downloaded the test pstack from RH and it was not good. Running this pstack, as soon as it hits a java thread within a Domino process it hangs the entire system. Since it's a hang, there is no netdump, but everything is hung, including the console of the system.
If the system is hung that is by definition a kernel issue. We may be able to learn more if a) it's possible to use Alt-SysRq on the console when wedged, the t command will show the nature of the wedge, and the c command will induce a netdump. Also, you might be able to boot with nmi_watchdog=1 and get a netdump from the wedge that way (depending on the nature of the wedge).
------ Additional Comments From khoa.com 2003-03-12 18:29 ------- Kenbo - if we can get from you a multi-threaded program that you know will cause pstack to see only one thread, then the India team can help debug it. I know the India team can write a multi-threaded program, but it would be great if you can provide us a program and instructions to make sure that we don't waste time trying to recreate it. Thanks.
------ Additional Comments From ssant.com 2003-04-12 00:23 ------- Kenneth , i tried using a small multi-threaded program to recreate the problem. When i attached the pid of running program to pstack i was able to see stack for all the threads. Here is the test program i used .... ----------------------- Start Test Program ------------------------------ #include <pthread.h> #include <unistd.h> #include <stdlib.h> #include <stdio.h> #include <errno.h> #include <assert.h> static pthread_mutex_t gbl_mutex = PTHREAD_MUTEX_INITIALIZER; static pthread_cond_t gbl_condv = PTHREAD_COND_INITIALIZER; void waitThread_cleanup(void *arg) { int rc; rc = pthread_mutex_unlock(&gbl_mutex); assert(rc == 0); return; } void * waitThread(void *arg) { int rc; pthread_cleanup_push(waitThread_cleanup, NULL); rc = pthread_mutex_lock(&gbl_mutex); assert(rc == 0); /* wait until this thread is canceled */ while (1 == 1) { rc = pthread_cond_wait(&gbl_condv, &gbl_mutex); assert(rc == 0); } /* this routine never reaches this point */ rc = pthread_mutex_unlock(&gbl_mutex); assert(rc == 0); pthread_cleanup_pop(0); return NULL; } main (int argc, char *argv[]) { int i, rc; pthread_t wait_tid; for (i = 0; i < 1000000; i++) { fprintf(stderr, "loop %d ", i); rc = pthread_create(&wait_tid, NULL, waitThread, NULL); assert(rc == 0); sleep (3); rc = pthread_cancel(wait_tid); assert(rc == 0); rc = pthread_join(wait_tid, NULL); assert(rc == 0); } return; } ----------------------- Start End Program ------------------------------ Here is the output of pstack command i got. The ps -ef | grep mtcond command showed three threads running with pid 32311 , 32312 and 32368. I used pstack with pid 32311. #ps -ef | grep mtcond root 32312 32311 0 10:06 tty3 00:00:00 ./mtcond root 32311 4765 0 10:06 tty3 00:00:00 ./mtcond root 32368 32312 0 10:08 tty3 00:00:00 ./mtcond root 32370 32241 0 10:08 tty2 00:00:00 grep mtcond # # #pstack 32311 32311: ./mtcond ----- Thread 32311 ----- 0x400f35b1: __nanosleep + 0x11 (3, 0, 804876c, 0, 40016b4c, bfffe5f4) + 10 0x08048872: main + 0x7a (1, bfffe5f4, bfffe5fc, 804853e, 8048930, 0) + 20 0x40054657: __libc_start_main + 0x93 (80487f8, 1, bfffe5f4, 8048528, 8048930, 40 00dcd4) + 40001a18 ----- Thread 32312 ----- 0x4011e3f7: __poll + 0x23 (804b9c4, 1, 7d0, 4003514c, 0, 3) + 150 0x40029920: __pthread_manager + 0x17c (3, 0, 4e1, 0, 0, 0) + f7fb44cc ----- Thread 32368 ----- 0x40066bb5: __sigsuspend + 0x21 (409739cc, 20, 409739cc, 0, 0, 0) + 90 0x4002c1d9: __pthread_wait_for_restart_signal + 0x59 (40973be0, 80499c0, 40973aa 4, 40028b46, 80499b8, 0) + 20 0x40028bdc: pthread_cond_wait + 0x118 (80499c0, 80499a8, 0, 0, 8048730, 0) + 20 0x080487d2: waitThread + 0x66 (0, 40973c84, 0, 40029bb1, 0, 0) + d0 0x40029c6f: pthread_start_thread + 0x16f (40973be0, 40973be0, 0, 0, 0, 0) + bf68 c40c Is this the expected output. Can you paste the o/p you got after running pstack command. Also when you do nm on thread library does it show __pthread_threads_debug as an exported symbol. I am using pstack-1.1-1 on RHAS 2.1 May be i will try this on a RHEL system also.
------ Additional Comments From kenbo.com 2003-04-12 10:04 ------- In order to reproduce this bug, you need to do the following on a RHEL 3.0 system: 1. About 30 minutes from now (still uploading linux.tar), ftp to the ltc ftp server and download from /kenbo a) linux.tar (~715Mb), b) the pstack RPM we got from RH, and c) nsd.sh.pstk2 2. unpack linux.tar and use the contents (Pre-Release 6.51 daily non production build) to install/setup the Domino server 3. cd to notes exec directory (/opt/lotus/notes/latest/linux, for example), rename nsd.sh to nsd.sh.org and copy nsd.sh.pstk2 to nsd.sh (make sure execute bits are set) 3. install the pstack rpm. 4. in a window, start up Domino from the data directory 5. once server is up and running, in another window run /opt/lotus/bin/nsd. During the run, when pstack hits a program which has the JVM running within it (such as http or amgr if running java agents), the OS will hang. If it does not hang, let me know, cause it may be that either 1) the JVM is not running within http or amgr at the time, or 2) the nsd is not firing correctly to use the local pstack.
A system hang is a wholly separate kernel bug utterly unrelated to this issue in pstack itself. You need to file that separately as a kernel bug. For us to be able to investigate it, we'll need to be able to reproduce it ourselves, and it's unlikely we can use the Notes installation to do that without a lot of hassle.
------ Additional Comments From khoa.com 2003-08-12 10:26 ------- Sachin - is there any update from your team on this ? Have you been able to recreate the problem on your end ? Thanks.
------ Additional Comments From ssant.com 2003-08-12 22:42 ------- Khoa / kenneth , sorry for not updating the bug. I was able to download the testcase from LTC ftp site. Thanks to the lotus team for that. I was successfully able to install the domino server on a RHEL 3 system. I was having some problems in getting the domino server up and running. It turned out that compatc-libstdc++ rpm was not installed on the system and domino needs this rpm to be installed. Today i will try to recreate the problem and will update the bug report. Thanks
------ Additional Comments From ssant.com 2003-09-12 00:55 ------- Ok finally the setup is up and running. I was able to recreate the problem. When i used default pstack which comes with RHEL 3 [ version 1.1-1 ] i could see stack for only one thread , eventough it was a multi threaded program. Then i upgraded pstack with the one RH had supplied [ version 1.2-3.EL.1 ]. Using this pstack i could see stack information of all the threads. But with some threads it hangs the machine. I cannot use keyboard / mouse. But i can ping the machine. So it seem that it is not a total hang. After playing around with pstack i found that if i try to display information about processes like amgr , sched , ldap , calconn , replica , router , adminp i get stack information about all the threads. The problem is only with event and server processes. If i try to display stack info of these two processes it hangs the machine. So it seems pstack has some problem with these processes. I will try to go through the source of pstack and try to see if i can find any clues. Thanks
There is no point in examining pstack to understand why the entire system hangs. That is by definition a kernel issue. Please concentrate on ascertaining exactly what system calls pstack makes that lead to the wedge. For example, use strace on pstack with the output coming to a terminal you can capture to get all possible info up to the time of the wedge. This problem must be filed in RH bugzilla against the kernel, though it may be a known issue you are using the RHEL3-GA kernel.
------ Additional Comments From kenbo.com 2003-09-12 21:26 ------- As suggested by RH, we need to close this bug, as the new pstack which will be in Update 1 that they gave us does see threads now with Linuxthreads. I have opened a new bugzilla - Bug 5608 - in the LTC bugzilla against the kernel hanging when pstack is run and transferred all of the information from here into it and we should promote that into the RH bugzilla database and close the RH bugzilla relating to this one as well. Sound right y'all? Thanks!
------ Additional Comments From khoa.com 2003-12-10 10:12 ------- Since pstack version 1.2.-3.EL.1 addresses this specific problem (seeing all threads), I agree that we can close this bug report now with resolution "Fix_Already_Avail" and note that this new version of pstack will be included in RHEL3 Update1 as Kenbo mentioned above. We'll address the hang problem in Bug 5608. Thanks.