Bug 107305 - LTC4943-pstack unable to see threads when using LinuxThreads
Summary: LTC4943-pstack unable to see threads when using LinuxThreads
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: pstack
Version: 3.0
Hardware: i386
OS: Linux
high
high
Target Milestone: ---
Assignee: Roland McGrath
QA Contact: David Lawrence
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-10-16 18:07 UTC by IBM Bug Proxy
Modified: 2007-11-30 22:06 UTC (History)
5 users (show)

Fixed In Version: 1.2.-3.EL.1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2003-12-11 00:39:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2003:250 0 normal SHIPPED_LIVE Pstack bugfix errata 2003-12-19 05:00:00 UTC
Red Hat Product Errata RHBA-2003:362 0 normal SHIPPED_LIVE Updated pstack packages include several bugfixes 2003-11-18 05:00:00 UTC

Description IBM Bug Proxy 2003-10-16 18:07:16 UTC
The following has be reported by IBM LTC:  
pstack  unable to see threads when using LinuxThreads
Hardware Environment:

Software Environment:


Steps to Reproduce:
1.export LD_ASSUME_KERNEL=2.4.19
2.start a multi-threaded program
3.run pstack against the program's pid

Actual Results:
only see stack for one thread of app

Expected Results:
should see stacks for all threads of the app

Additional Information:
gdb works fine in LinuxThreads mode, but pstack does not.  No errors are 
reported, just only see one stack.More Information:

Description of pStack: 
pstack is needed  for collecting debug information  (call stacks) when the 
Domino server fails at a customer site.    This information is used to 
diagnose the root cause of the failure in order to provide a fix for the 
customer problem.

Platforms: 
We need a binary version of pStack for all platforms supported by Domino.  
(currently Intel, zSeries) 

Installation: 
We would like pStack to be installed by default.

We use pstack for first failure data collection.  The Domino product is a 
collection of multi-threaded programs all running together sharing resources 
via shared memory.  

When we  have a failure, it can be such that one program will fail but another 
program has caused the failure.  

In order to aid us in debugging these crashes, we have an automated test 
tool "nsd" which uses pstack on Linux to obtain the stacks of all threads in 
all programs.  

No other tool gives us this ability.  Without this feature, remote debugging - 
ie., customer sites, would be next to impossible except in the most simplest 
of failures, which we usually catch in-house. 

It is likely that other groups will be able to make use of pstack in the same 
way we do once it is operational.

 Glen/Greg - this is a pstack problem.  Please submit this to Red Hat.  Thanks.

Comment 1 Roland McGrath 2003-11-09 00:39:53 UTC
pstack needs a few updates to match linuxthreads changes.
I will work on these.


Comment 4 Roland McGrath 2003-11-11 22:24:43 UTC
*** Bug 106656 has been marked as a duplicate of this bug. ***

Comment 5 Karen Bennet 2003-11-17 19:17:50 UTC
The fixes to support newer linuxthreads have been included in
Update-1. The only platform that is addressed is x86 architecture.


     

Comment 8 mark wisner 2003-11-21 17:30:55 UTC
------ Additional Comments From kenbo.com  2003-21-11 12:30 -------
Got the link and downloaded the test pstack from RH and it was not good.  
Running this pstack, as soon as it hits a java thread within a Domino process 
it hangs the entire system.  Since it's a hang, there is no netdump, but 
everything is hung, including the console of the system. 

Comment 9 Roland McGrath 2003-11-21 20:55:38 UTC
If the system is hung that is by definition a kernel issue.
We may be able to learn more if a) it's possible to use Alt-SysRq on
the console when wedged, the t command will show the nature of the wedge,
and the c command will induce a netdump.  Also, you might be able to
boot with nmi_watchdog=1 and get a netdump from the wedge that way
(depending on the nature of the wedge).


Comment 10 mark wisner 2003-12-04 00:23:59 UTC
------ Additional Comments From khoa.com  2003-03-12 18:29 -------
Kenbo - if we can get from you a multi-threaded program that you know will
cause pstack to see only one thread, then the India team can help debug it.
I know the India team can write a multi-threaded program, but it would be
great if you can provide us a program and instructions to make sure that
we don't waste time trying to recreate it.  Thanks. 

Comment 11 mark wisner 2003-12-04 05:26:25 UTC
------ Additional Comments From ssant.com  2003-04-12 00:23 -------
Kenneth , i tried using a small multi-threaded program to recreate the problem.
When i attached the pid of running program to pstack i was able to see stack for 
all the threads. Here is the test program i used ....

----------------------- Start Test Program ------------------------------

#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <assert.h>

static pthread_mutex_t gbl_mutex = PTHREAD_MUTEX_INITIALIZER;
static pthread_cond_t  gbl_condv = PTHREAD_COND_INITIALIZER;

void waitThread_cleanup(void *arg)
{
  int rc;

  rc = pthread_mutex_unlock(&gbl_mutex);
  assert(rc == 0);

  return;
}

void * waitThread(void *arg)
{
  int rc;

  pthread_cleanup_push(waitThread_cleanup, NULL);

  rc = pthread_mutex_lock(&gbl_mutex);
 assert(rc == 0);

  /* wait until this thread is canceled */

  while (1 == 1) {
    rc = pthread_cond_wait(&gbl_condv, &gbl_mutex);
    assert(rc == 0);
  }

  /* this routine never reaches this point */

  rc = pthread_mutex_unlock(&gbl_mutex);
  assert(rc == 0);

  pthread_cleanup_pop(0);

  return NULL;
}

main (int argc, char *argv[])
{
  int i, rc;
  pthread_t wait_tid;

  for (i = 0; i < 1000000; i++)
    {
      fprintf(stderr, "loop %d
", i);

      rc = pthread_create(&wait_tid, NULL, waitThread, NULL);
      assert(rc == 0);
      sleep (3);
      rc = pthread_cancel(wait_tid);
      assert(rc == 0);

      rc = pthread_join(wait_tid, NULL);
      assert(rc == 0);
    }

  return;
}

----------------------- Start End Program ------------------------------

Here is the output of pstack command i got. The ps -ef | grep mtcond command 
showed three threads running with pid 32311 , 32312 and 32368. I used pstack 
with pid 32311.

#ps -ef | grep mtcond
root     32312 32311  0 10:06 tty3     00:00:00 ./mtcond
root     32311  4765  0 10:06 tty3     00:00:00 ./mtcond
root     32368 32312  0 10:08 tty3     00:00:00 ./mtcond
root     32370 32241  0 10:08 tty2     00:00:00 grep mtcond
#
#
#pstack 32311
32311: ./mtcond
----- Thread 32311 -----
0x400f35b1: __nanosleep + 0x11 (3, 0, 804876c, 0, 40016b4c, bfffe5f4) + 10
0x08048872: main + 0x7a (1, bfffe5f4, bfffe5fc, 804853e, 8048930, 0) + 20
0x40054657: __libc_start_main + 0x93 (80487f8, 1, bfffe5f4, 8048528, 8048930, 40
00dcd4) + 40001a18
----- Thread 32312 -----
0x4011e3f7: __poll + 0x23 (804b9c4, 1, 7d0, 4003514c, 0, 3) + 150
0x40029920: __pthread_manager + 0x17c (3, 0, 4e1, 0, 0, 0) + f7fb44cc
----- Thread 32368 -----
0x40066bb5: __sigsuspend + 0x21 (409739cc, 20, 409739cc, 0, 0, 0) + 90
0x4002c1d9: __pthread_wait_for_restart_signal + 0x59 (40973be0, 80499c0, 40973aa
4, 40028b46, 80499b8, 0) + 20
0x40028bdc: pthread_cond_wait + 0x118 (80499c0, 80499a8, 0, 0, 8048730, 0) + 20
0x080487d2: waitThread + 0x66 (0, 40973c84, 0, 40029bb1, 0, 0) + d0
0x40029c6f: pthread_start_thread + 0x16f (40973be0, 40973be0, 0, 0, 0, 0) + bf68
c40c

Is this the expected output. Can you paste the o/p you got after running pstack 
command. Also when you do nm on thread library does it show 
__pthread_threads_debug as an exported symbol.

I am using pstack-1.1-1 on RHAS 2.1 May be i will try this on a RHEL system 
also. 

Comment 12 mark wisner 2003-12-04 15:08:20 UTC
------ Additional Comments From kenbo.com  2003-04-12 10:04 -------
In order to reproduce this bug, you need to do the following on a RHEL 3.0 
system:

  1. About 30 minutes from now (still uploading linux.tar), ftp to the ltc ftp 
server and download from /kenbo a) linux.tar (~715Mb), b) the pstack RPM we 
got from RH, and c) nsd.sh.pstk2
  2. unpack linux.tar and use the contents (Pre-Release 6.51 daily non 
production build) to install/setup the Domino server
  3. cd to notes exec directory (/opt/lotus/notes/latest/linux, for example), 
rename nsd.sh to nsd.sh.org and copy nsd.sh.pstk2 to nsd.sh (make sure execute 
bits are set)
  3. install the pstack rpm.
  4. in a window, start up Domino from the data directory
  5. once server is up and running, in another window run /opt/lotus/bin/nsd.  
During the run, when pstack hits a program which has the JVM running within it 
(such as http or amgr if running java agents), the OS will hang.

If it does not hang, let me know, cause it may be that either 1) the JVM is 
not running within http or amgr at the time, or 2) the nsd is not firing 
correctly to use the local pstack. 

Comment 13 Roland McGrath 2003-12-04 22:52:08 UTC
A system hang is a wholly separate kernel bug utterly unrelated to
this issue in pstack itself.  You need to file that separately as a
kernel bug.  For us to be able to investigate it, we'll need to be
able to reproduce it ourselves, and it's unlikely we can use the Notes
installation to do that without a lot of hassle.

Comment 14 IBM Bug Proxy 2003-12-08 15:26:42 UTC
------ Additional Comments From khoa.com  2003-08-12 10:26 -------
Sachin - is there any update from your team on this ?  Have you been able to
recreate the problem on your end ?  Thanks. 

Comment 17 IBM Bug Proxy 2003-12-09 14:53:28 UTC
------ Additional Comments From ssant.com  2003-08-12 22:42 -------
Khoa / kenneth , sorry for not updating the bug. I was able to download the 
testcase from LTC ftp site. Thanks to the lotus team for that. I was 
successfully able to install the domino server on a RHEL 3 system. I was having 
some problems in getting the domino server up and running. It turned out that 
compatc-libstdc++ rpm was not installed on the system and domino needs this rpm 
to be installed. Today i will try to recreate the problem and will update the 
bug report.
Thanks 

Comment 18 IBM Bug Proxy 2003-12-09 14:54:05 UTC
------ Additional Comments From ssant.com  2003-09-12 00:55 -------
Ok finally the setup is up and running. I was able to recreate the problem.

When i used default pstack which comes with RHEL 3 [ version 1.1-1 ] i could see 
 stack for only one thread , eventough it was a multi threaded program.

Then i upgraded pstack with the one RH had supplied [ version 1.2-3.EL.1 ]. 
Using this pstack i could see stack information of all the threads. But with 
some threads it hangs the machine. I cannot use keyboard / mouse. But i can ping 
the machine. So it seem that it is not a total hang. 

After playing around with pstack i found that if i try to display information 
about processes like amgr , sched , ldap , calconn , replica , router , adminp i 
get stack information about all the threads. The problem is only with event and 
server processes. If i try to display stack info of these two processes it hangs 
the machine. So it seems pstack has some problem with these processes.

I will try to go through the source of pstack and try to see if i can find any 
clues.

Thanks 

Comment 19 Roland McGrath 2003-12-10 01:51:15 UTC
There is no point in examining pstack to understand why the entire
system hangs.  That is by definition a kernel issue.  Please
concentrate on ascertaining exactly what system calls pstack makes
that lead to the wedge.  For example, use strace on pstack with the
output coming to a terminal you can capture to get all possible info
up to the time of the wedge.  This problem must be filed in RH
bugzilla against the kernel,
though it may be a known issue you are using the RHEL3-GA kernel.


Comment 20 IBM Bug Proxy 2003-12-10 02:26:43 UTC
------ Additional Comments From kenbo.com  2003-09-12 21:26 -------
As suggested by RH, we need to close this bug, as the new pstack which will be 
in Update 1 that they gave us does see threads now with Linuxthreads.  I have 
opened a new bugzilla - Bug 5608 - in the LTC bugzilla against the kernel 
hanging when pstack is run and transferred all of the information from here 
into it and we should promote that into the RH bugzilla database and close the 
RH bugzilla relating to this one as well.  Sound right y'all?  Thanks! 

Comment 21 IBM Bug Proxy 2003-12-10 15:11:24 UTC
------ Additional Comments From khoa.com  2003-12-10 10:12 -------
Since pstack version 1.2.-3.EL.1 addresses this specific problem (seeing
all threads), I agree that we can close this bug report now with resolution
"Fix_Already_Avail" and note that this new version of pstack will be included
in RHEL3 Update1 as Kenbo mentioned above.  We'll address the hang problem
in Bug 5608.  Thanks. 


Note You need to log in before you can comment on or make changes to this bug.