145309 – gdb cause SIGSEGV.

Bug 145309 - gdb cause SIGSEGV.

Summary: gdb cause SIGSEGV.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	gdb
Sub Component:
Version:	4.0
Hardware:	ia64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Jeff Johnston
QA Contact:
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	IT_67609 (view as bug list)
Depends On:	126095 147436
Blocks:	137160 145334
TreeView+	depends on / blocked

Reported:	2005-01-17 10:55 UTC by L3support
Modified:	2010-10-22 02:45 UTC (History)
CC List:	13 users (show)
Fixed In Version:	RHBA-2005-241
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2005-06-09 11:32:23 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
test-program (1.15 KB, text/plain) 2005-02-04 07:46 UTC, L3support	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2005:241	0	high	SHIPPED_LIVE	gdb bug fix update	2005-06-09 04:00:00 UTC

Description L3support 2005-01-17 10:55:12 UTC

Description of problem:
 gdb cause SIGSEGV.

Version-Release number of selected component (if applicable):
 kernel-2.6.9-5.EL
 gdb-6.1post-1.20040607.62

How reproducible:
always

Steps to Reproduce:
1.compile test program(a.c).
  # gcc a.c
 -------------------
 <test program> a.c
 #include <stdlib.h>
 main()
 {
   abort();
 }
 -------------------
2.run test program(a.out), core dumped
  # ./a.out
3.run gdb using core
  # gdb ./core ./core.<PID>
  
Actual results:
 Segmentation fault
 We cannot debug our program.
====
 GNU gdb Red Hat Linux (6.1post-1.20040607.62rh)
 Copyright 2004 Free Software Foundation, Inc.
 (snip)
 Core was generated by `./core'.
 Program terminated with signal 6, Aborted.
 (snip)
 Segmentation fault (core dumped)
====

Expected results:
 No segmentation fault.  
 We can debug our program.

Additional info:
 We cannot debug our middleware at all. 
 Our middleware development for RHELv4 has been delayed.

Comment 1 L3support 2005-01-17 11:36:58 UTC

=====
# gdb gdb core.21931
GNU gdb Red Hat Linux (6.1post-1.20040607.62rh)
(snip)
Core was generated by `gdb core core.19066'.
Program terminated with signal 11, Segmentation fault.

warning: svr4_current_sos: Can't read pathname for load map: 
Input/output error
(snip)
#0  0x4000000000071b80 in ia64_write_pc ()
(gdb) bt
#0  0x4000000000071b80 in ia64_write_pc ()
#1  0x2000000000498300 in _Uia64_find_dyn_list () 
from /usr/lib/libunwind-ia64.so
#2  0x4000000000102ef0 in libunwind_find_dyn_list ()
#3  0x4000000000072860 in ia64_write_pc ()
(snip)
=====

=====
# tail strace-gdb-core.log
lseek(7, 163840, SEEK_SET)              = 163840
lseek(7, 163840, SEEK_SET)              = 163840
lseek(7, 163840, SEEK_SET)              = 163840
lseek(7, 163840, SEEK_SET)              = 163840
lseek(7, 163840, SEEK_SET)              = 163840
lseek(7, 163840, SEEK_SET)              = 163840
lseek(7, 163840, SEEK_SET)              = 163840
lseek(7, 163840, SEEK_SET)              = 163840
--- SIGSEGV (Segmentation fault) @ 4000000000071b80 
(8000000000000b60) ---
+++ killed by SIGSEGV (core dumped) +++
=====

Comment 2 L3support 2005-01-17 11:40:51 UTC

Changed "Product" and "Version"

Comment 3 JoAnne K. Halligan 2005-01-17 16:15:38 UTC

Dear Fujitsu Support team: 

I need to ask you to do a couple things on these types of reports. Please be
sure to include all information needed for Red Hat to review and try to
replicate the problem. This information should include what architecture, what
system and configuration and if anything like the 32 bit el (execution layer)
was used or not. 

Also, please be sure to copy the Fujitsu team on site in Westford as they are
also involved part time in helping with bug resolution. 

Finally, it is important to open a single bug for a problem report and then
update date it and not open duplicate bugs for the same problem. In reviewing
this one and the new BZ_145309, it appears they may be the same? Please close
one if they are and note in each if they are duplicates of each other. 

For things like RHEL4 beta2 - rc1 - rc2, that should all be under the same BZ #
as we would want you to try the latest version. 

I hope this helps clarify things as if you can help use the above process it
will better improve the response times and resolution of bug reports from Red Hat. 

Regards, 

JoAnne

Comment 4 JoAnne K. Halligan 2005-01-17 16:16:31 UTC

Adding Fujitsu on site team to the cc list.

Comment 5 Yuuichi Nagahama 2005-01-17 16:48:50 UTC

*** Bug 145092 has been marked as a duplicate of this bug. ***

Comment 6 Yuuichi Nagahama 2005-01-17 17:02:23 UTC

I talked to Tachino-san and we think the information that Fujitsu
provided is enough. We reproduced the problem on ia64 machine here in
Westford. I'm going to check with Fujitsu Support team in Japan
if they have any other test cases.
Red Hat Tools team, please investigate the problem.

Comment 8 L3support 2005-01-18 00:23:14 UTC

Nagahama-san

Yes, this is the problem on ia64 machine.
We don't use 32 bit el. 
We tried RHEL4-rc. Because the result of RHEL4-Beta2 and RHEL4-rc was 
different, another Bug was issued. 
 
Regards,
Fujitsu Japan Support team  Yoneda

Comment 9 Jeff Johnston 2005-01-22 01:42:11 UTC

A patch has been built into gdb-6.3.0.0-0.10 that prevents a SIGSEGV 
that occurred in running the given test.

Comment 10 L3support 2005-01-24 10:47:36 UTC

We want to test gdb-6.3.0.0-0.10.
Please give it.

Comment 13 Elena Zannoni 2005-01-24 21:58:54 UTC

It should be in rawhide, because it was built for fedora core4. If
that works we'll put in RHEL4-U1. Can they try that?
http://download.fedora.redhat.com/pub/fedora/linux/core/development/i386/Fedora/RPMS/
has the i386 rpms, analogous directories have the other arches.

I can alternatively put on my ftp page.

Comment 14 Elena Zannoni 2005-01-24 22:34:33 UTC

to clarify, the ia64 rpms are in:
http://download.fedora.redhat.com/pub/fedora/linux/core/development/ia64/Fedora/RPMS/

Comment 15 L3support 2005-01-26 09:18:35 UTC

 The problem which is Bugzilla #145309(gdb cause SIGSEGV.) doesn't 
occur now.
However, another problem which is "A part of backtrace cannot be 
referred" is occured.
The problem hasn't been solved yet because we still can't debug by 
using gdb.

We keep reporting this problem to Bugzilla#145309 on circumstances.

Steps to Reproduce:
1)Compile the attachment test program.

  $ gcc -lpthread -o thread thread.c
If you see the source you will understand easily.
It's really  simple test program as the thread is generated and 
sleeps.

2)Execute the program
$ ./thread &
[1] 13884

3)Core is gathered while operating.
$ gcore 13884

4)Refer to core with gdb.
$ gdb ./thread core.13884

Actual results:
=====
GNU gdb Red Hat Linux (6.3.0.0-0.10rh)
(Omittedï¼
Core was generated by `/work/testpro/thread'.
(Omittedï¼
warning: svr4_current_sos: Can't read pathname for load map: 
Input/output
error
(Omittedï¼
#0  0xa000000000010641 in ?? ()
(gdb) bt
(gdb) bt
#0  0xa000000000010641 in ?? ()
#1  0x20000000001c4420 in __GC___libc_nanosleep () 
from /lib/tls/libc.so.6.1
#2  0x2000000000302c50 in _IO_wide_data_2 () from /lib/tls/libc.so.6.1
warning: Can't fetch instructions for slot numbers greater than 2.
Using slot 0 instead
#3  0x0000000000000000 in ?? ()
(gdb) thread apply all bt

Thread 3 (process 13884):
#0  0xa000000000010641 in ?? ()
#1  0x20000000001c4420 in __GC___libc_nanosleep () 
from /lib/tls/libc.so.6.1
#2  0x2000000000090110 in default_attr () 
from /lib/tls/libpthread.so.0

Thread 2 (process 13886):
#0  0xa000000000010641 in ?? ()
#1  0x20000000001c4420 in __GC___libc_nanosleep () 
from /lib/tls/libc.so.6.1
#2  0x2000000000302c50 in _IO_wide_data_2 () from /lib/tls/libc.so.6.1
warning: Can't fetch instructions for slot numbers greater than 2.
Using slot 0 instead
#3  0x0000000000000000 in ?? ()

Thread 1 (process 13885):
#0  0xa000000000010641 in ?? ()
#1  0x20000000001c4420 in __GC___libc_nanosleep () 
from /lib/tls/libc.so.6.1
#2  0x2000000000302c50 in _IO_wide_data_2 () from /lib/tls/libc.so.6.1
warning: Can't fetch instructions for slot numbers greater than 2.
Using slot 0 instead
#3  0x0000000000000000 in ?? ()
=====

Expected results:
 The following results are the results of RHELv4+i386 machine.
 RHELv4+IPF machine expects all the backtraces to be shown.

=====
$ gdb ./thread core.16397
GNU gdb Red Hat Linux (6.3.0.0-0.10rh)
Copyright 2004 Free Software Foundation, Inc.
(Omittedï¼

#0  0x00d2b15c in __nanosleep_nocancel () from /lib/tls/i686/libc.so.6
(gdb) bt
#0  0x00d2b15c in __nanosleep_nocancel () from /lib/tls/i686/libc.so.6
#1  0x00d2af97 in sleep () from /lib/tls/i686/libc.so.6
#2  0x08048592 in threadA ()
#3  0x00c5e0dd in start_thread () from /lib/tls/i686/libpthread.so.0
#4  0x00d62a2e in clone () from /lib/tls/i686/libc.so.6
(gdb) thread apply all bt

Thread 3 (process 16397):
#0  0x00d2b15c in __nanosleep_nocancel () from /lib/tls/i686/libc.so.6
#1  0x00d2af97 in sleep () from /lib/tls/i686/libc.so.6
#2  0x080484fe in main ()

Thread 2 (process 16399):
#0  0x00d2b15c in __nanosleep_nocancel () from /lib/tls/i686/libc.so.6
#1  0x00d2af97 in sleep () from /lib/tls/i686/libc.so.6
#2  0x080485d2 in threadB ()
#3  0x00c5e0dd in start_thread () from /lib/tls/i686/libpthread.so.0
#4  0x00d62a2e in clone () from /lib/tls/i686/libc.so.6

Thread 1 (process 16398):
#0  0x00d2b15c in __nanosleep_nocancel () from /lib/tls/i686/libc.so.6
#1  0x00d2af97 in sleep () from /lib/tls/i686/libc.so.6
#2  0x08048592 in threadA ()
#3  0x00c5e0dd in start_thread () from /lib/tls/i686/libpthread.so.0
#4  0x00d62a2e in clone () from /lib/tls/i686/libc.so.6
=====

Influence:
 If the backtrace of a simple test program like the test program
 which we attached cannot be shown, the middleware cannot be debugged.
 Actually, we cannot debug our middleware at all. 
 Our middleware development for RHELv4 has been delayed.

Comment 16 L3support 2005-01-26 09:28:19 UTC

Steps to Reproduce:
1)Compile the attachment test program.

  $ gcc -lpthread -o thread thread.c
If you see the source you will understand easily.
It's really  simple test program as the thread is generated and 
sleeps.

2)Execute the program
$ ./thread &
[1] 13884

3)Core is gathered while operating.
$ gcore 13884

4)Refer to core with gdb.
$ gdb ./thread core.13884

cat thread.c
=====
#include <errno.h>
#include <pthread.h>
#include <signal.h>

static void *threadA(void *tname);
static void *threadB(void *tname);

int main()
{
   pthread_t  thrdidA ;
   pthread_t  thrdidB ;
   int  ret;
   void *status;

   printf( "TEST START\n" ) ;

   if ((ret=pthread_create(&thrdidA,NULL,threadA,(void *)"THREAD-
A"))) {
       printf( "    pthread_create ERROR errno=%d\n", ret ) ;
   }

   if ((ret=pthread_create(&thrdidB,NULL,threadB,(void *)"THREAD-
B"))) {
       printf( "    pthread_create ERROR errno=%d\n", ret ) ;
   }

   sleep(10);
   if ((ret=pthread_join(thrdidA,&status))) {
       printf( "    pthead_join ERROR errono=%d\n", ret ) ;
   }
   
   if ((ret=pthread_join(thrdidB,&status))) {
       printf( "    pthead_join ERROR errono=%d\n", ret ) ;
   }
   
   printf( "TEST END\n" ) ;
 }

static void *threadA(void *tname)
{
     printf( "%s START\n",(char *)tname );
     sleep(10);
     printf( "%s END\n",(char *)tname );
     return(NULL);
}


static void *threadB(void *tname)
{
     printf( "%s START\n",(char *)tname );
     sleep(10);
     printf( "%s END\n",(char *)tname );
     return(NULL);
}
=====

Comment 17 L3support 2005-02-04 07:44:55 UTC

We tested gdb-6.3.0.0-0.13.ia64.rpm.

Our "Expected results" has not been achieved yet.

Though it doesn't see the problem easily "Refer to the corefile" A 
part of
the stack cannot be displayed.
The stack  which  "Process is connected with gdb while operating" can 
be
displayed.

We expect it can be displayed  the all stacks by referring the 
corefile with
gdb.

[Steps to Reproduce:]
=====
Case1: "Refer to the corefile"
1)Compile the attachment test program.

  $ gcc -lpthread -o thread thread.c
If you see the source you will understand easily.
It's really  simple test program as the thread is generated and 
sleeps.

2)Execute the program
$ ./thread &
[1] 13884

3)Core is gathered while operating.
$ gcore 13884

4)Refer to core with gdb.
$ gdb ./thread core.13884
GNU gdb Red Hat Linux (6.3.0.0-0.13rh)

(gdb) bt
#0  0xa000000000010641 in ?? ()
#1  0x20000000001c4440 in __GC___libc_nanosleep () 
from /lib/tls/libc.so.6.1
#2  0x2000000000302c50 in _IO_wide_data_2 () from /lib/tls/libc.so.6.1
warning: Can't fetch instructions for slot numbers greater than 2.
Using slot 0 instead
#3  0x20000000001c4420 in __GC___libc_nanosleep () 
from /lib/tls/libc.so.6.1
Previous frame inner to this frame (corrupt stack?)
(gdb) thread apply all backtrace

Thread 3 (process 6704):
#0  0xa000000000010641 in ?? ()
#1  0x20000000001c4440 in __GC___libc_nanosleep () 
from /lib/tls/libc.so.6.1
#2  0x2000000000090110 in default_attr () 
from /lib/tls/libpthread.so.0
#3  0x20000000001c4420 in __GC___libc_nanosleep () 
from /lib/tls/libc.so.6.1
Previous frame inner to this frame (corrupt stack?)

Thread 2 (process 6706):
#0  0xa000000000010641 in ?? ()
#1  0x20000000001c4440 in __GC___libc_nanosleep () 
from /lib/tls/libc.so.6.1
#2  0x2000000000302c50 in _IO_wide_data_2 () from /lib/tls/libc.so.6.1
warning: Can't fetch instructions for slot numbers greater than 2.
Using slot 0 instead
#3  0x20000000001c4420 in __GC___libc_nanosleep () 
from /lib/tls/libc.so.6.1
Previous frame inner to this frame (corrupt stack?)

Thread 1 (process 6705):
#0  0xa000000000010641 in ?? ()
#1  0x20000000001c4440 in __GC___libc_nanosleep () 
from /lib/tls/libc.so.6.1
#2  0x2000000000302c50 in _IO_wide_data_2 () from /lib/tls/libc.so.6.1
warning: Can't fetch instructions for slot numbers greater than 2.
Using slot 0 instead
#3  0x20000000001c4420 in __GC___libc_nanosleep () 
from /lib/tls/libc.so.6.1
Previous frame inner to this frame (corrupt stack?) =====

[Actual results:]
 Following error message a),b) and part of the stack cannot be 
displayed.
 a)"Previous frame inner to this frame (corrupt stack?)"
 b)"warning: Can't fetch instructions for slot numbers greater than 2.
    Using slot 0 instead"

[Expected results:]
Case2: "Process is connected with gdb while operating"
1)Execute the same program
$ ./thread &
[1] 14070

2)Connect the process with gdb.
$ gdb -p 14070
GNU gdb Red Hat Linux (6.3.0.0-0.13rh)

(gdb) bt
#0  0xa000000000010641 in ?? ()
#1  0x20000000001c4440 in __GC___libc_nanosleep () 
from /lib/tls/libc.so.6.1
#2  0x20000000001c4090 in sleep () from /lib/tls/libc.so.6.1
#3  0x40000000000009e0 in main ()
(gdb) thread apply all backtrace

Thread 3 (Thread 2305843009227420288 (LWP 14071)):
#0  0xa000000000010641 in ?? ()
#1  0x20000000001c4440 in __GC___libc_nanosleep () 
from /lib/tls/libc.so.6.1
#2  0x20000000001c4090 in sleep () from /lib/tls/libc.so.6.1
#3  0x4000000000000b90 in threadA ()
#4  0x200000000007d630 in start_thread () 
from /lib/tls/libpthread.so.0
#5  0x200000000023ef90 in __clone2 () from /lib/tls/libc.so.6.1

Thread 2 (Thread 2305843009237906048 (LWP 14072)):
#0  0xa000000000010641 in ?? ()
#1  0x20000000001c4440 in __GC___libc_nanosleep () 
from /lib/tls/libc.so.6.1
#2  0x20000000001c4090 in sleep () from /lib/tls/libc.so.6.1
#3  0x4000000000000c40 in threadB ()
#4  0x200000000007d630 in start_thread () 
from /lib/tls/libpthread.so.0
#5  0x200000000023ef90 in __clone2 () from /lib/tls/libc.so.6.1

Thread 1 (Thread 2305843009216872448 (LWP 14070)):
#0  0xa000000000010641 in ?? ()
#1  0x20000000001c4440 in __GC___libc_nanosleep () 
from /lib/tls/libc.so.6.1
#2  0x20000000001c4090 in sleep () from /lib/tls/libc.so.6.1
#3  0x40000000000009e0 in main ()

"__ clone2" and "ThreadA", etc. are displayed.
This is a result of our expectation !
=====

If all stacks cannot be displayed even in "Refer to the corefile", it 
is
insufficient as the gdb debugging function.

[Additional info:]
 You said as follows.
 "it(gdb-6.3.0.0-0.10) should be in rawhide,
  because it was built for fedora core4.
  If that works we'll put in RHEL4-U1."

 However, it is too late in RHEL4-U1.
 Because gdb before gdb-6.3.0.0-0.10 doesn't work !
 If you are a customer, do you use gdb that can be seen only  "Signal
handler", and outputs core?

 We strongly hope for "work fine" gdb to be provided,  and to be 
included in
RHEL4-GA.
 We haven't got a much time by now. Please do it as fast as you can.

Regards,
Fujitsu Japan Support team  Yoneda

Comment 18 L3support 2005-02-04 07:46:27 UTC

Created attachment 110637 [details]
test-program

Comment 21 Jeff Johnston 2005-02-05 04:55:55 UTC

A fix has been built into gdb-6.3.0.0-0.16   A mechanism used by gcore
to read larger chunks of storage at a time is not working properly for
threads.  This fix falls back to use an older, reliable mechanism,
that can only read small chunks at a time.

Comment 23 L3support 2005-02-08 11:29:17 UTC

I read #147436 and knew problems of /proc/yyyy/mem and 
/proc/yyyy/task/xxxx/mem. Then I tested gdb-6.3.0.0-0.16.ia64.rpm.
(yyyy is the main process pid and xxxx is the lwp of the thread)

[Expected Result]
We expect it can be displayed  the all stacks by referring the 
corefile with gdb.
I do not worry taking time by using PTRACE. 

[Actual Result]
"Refer to the corefile" A part of the stack cannot be displayed.
(Same as gdb-6.3.0.0-0.13)

Does gdb-6.3.0.0-0.16 works fine when #147436 is solved ?
Or, in gdb-6.3.0.0-0.16, is other problems still?

Regards,
Fujitsu Japan Support team  Yoneda

Comment 24 Jeff Johnston 2005-02-08 21:13:28 UTC

Problem reproduced.  In my testing, I was issuing a "thread 2" command
before running gcore.  If you issue thread 1 in your test, it should
work fine with 0.16.  What is happening is that there is still a call
to look at the /proc/xxx/mem storage, but this is meant to be only
invoked when it sees the main thread of a non-threaded program (i.e.
to not hinder the performance of an unthreaded program using gcore). 
The test is faulty and mistook the initial conditions you have above
as being the non-threaded case.  Therefore, I have simply removed the
call that reads the /proc mem for the time being.  I have rebuilt the
fix into gdb-6.3.0.0-0.18.  When the kernel fix is made I will replace
the call again.

Comment 25 L3support 2005-02-10 06:57:12 UTC

I tested with 0.18 gcore. 
It works very slowly. However, it works fine at my test-program.

Thanks,
Fujitsu Japan Support team  Yoneda

Comment 26 L3support 2005-02-24 08:47:35 UTC

Refer to Issue 66089.

Comment 30 Jay Turner 2005-03-17 12:07:50 UTC

gdb-6.3.0.0-0.31.ia64 has been tested and confirmed to resolve all issues shown
above.

Comment 31 Jeff Johnston 2005-03-28 20:36:45 UTC

*** Bug 150068 has been marked as a duplicate of this bug. ***

Comment 34 Tim Powers 2005-06-09 11:32:23 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-241.html

Note You need to log in before you can comment on or make changes to this bug.