Description of problem: gdb cause SIGSEGV. Version-Release number of selected component (if applicable): kernel-2.6.9-5.EL gdb-6.1post-1.20040607.62 How reproducible: always Steps to Reproduce: 1.compile test program(a.c). # gcc a.c ------------------- <test program> a.c #include <stdlib.h> main() { abort(); } ------------------- 2.run test program(a.out), core dumped # ./a.out 3.run gdb using core # gdb ./core ./core.<PID> Actual results: Segmentation fault We cannot debug our program. ==== GNU gdb Red Hat Linux (6.1post-1.20040607.62rh) Copyright 2004 Free Software Foundation, Inc. (snip) Core was generated by `./core'. Program terminated with signal 6, Aborted. (snip) Segmentation fault (core dumped) ==== Expected results: No segmentation fault. We can debug our program. Additional info: We cannot debug our middleware at all. Our middleware development for RHELv4 has been delayed.
===== # gdb gdb core.21931 GNU gdb Red Hat Linux (6.1post-1.20040607.62rh) (snip) Core was generated by `gdb core core.19066'. Program terminated with signal 11, Segmentation fault. warning: svr4_current_sos: Can't read pathname for load map: Input/output error (snip) #0 0x4000000000071b80 in ia64_write_pc () (gdb) bt #0 0x4000000000071b80 in ia64_write_pc () #1 0x2000000000498300 in _Uia64_find_dyn_list () from /usr/lib/libunwind-ia64.so #2 0x4000000000102ef0 in libunwind_find_dyn_list () #3 0x4000000000072860 in ia64_write_pc () (snip) ===== ===== # tail strace-gdb-core.log lseek(7, 163840, SEEK_SET) = 163840 lseek(7, 163840, SEEK_SET) = 163840 lseek(7, 163840, SEEK_SET) = 163840 lseek(7, 163840, SEEK_SET) = 163840 lseek(7, 163840, SEEK_SET) = 163840 lseek(7, 163840, SEEK_SET) = 163840 lseek(7, 163840, SEEK_SET) = 163840 lseek(7, 163840, SEEK_SET) = 163840 --- SIGSEGV (Segmentation fault) @ 4000000000071b80 (8000000000000b60) --- +++ killed by SIGSEGV (core dumped) +++ =====
Changed "Product" and "Version"
Dear Fujitsu Support team: I need to ask you to do a couple things on these types of reports. Please be sure to include all information needed for Red Hat to review and try to replicate the problem. This information should include what architecture, what system and configuration and if anything like the 32 bit el (execution layer) was used or not. Also, please be sure to copy the Fujitsu team on site in Westford as they are also involved part time in helping with bug resolution. Finally, it is important to open a single bug for a problem report and then update date it and not open duplicate bugs for the same problem. In reviewing this one and the new BZ_145309, it appears they may be the same? Please close one if they are and note in each if they are duplicates of each other. For things like RHEL4 beta2 - rc1 - rc2, that should all be under the same BZ # as we would want you to try the latest version. I hope this helps clarify things as if you can help use the above process it will better improve the response times and resolution of bug reports from Red Hat. Regards, JoAnne
Adding Fujitsu on site team to the cc list.
*** Bug 145092 has been marked as a duplicate of this bug. ***
I talked to Tachino-san and we think the information that Fujitsu provided is enough. We reproduced the problem on ia64 machine here in Westford. I'm going to check with Fujitsu Support team in Japan if they have any other test cases. Red Hat Tools team, please investigate the problem.
Nagahama-san Yes, this is the problem on ia64 machine. We don't use 32 bit el. We tried RHEL4-rc. Because the result of RHEL4-Beta2 and RHEL4-rc was different, another Bug was issued. Regards, Fujitsu Japan Support team Yoneda
A patch has been built into gdb-6.3.0.0-0.10 that prevents a SIGSEGV that occurred in running the given test.
We want to test gdb-6.3.0.0-0.10. Please give it.
It should be in rawhide, because it was built for fedora core4. If that works we'll put in RHEL4-U1. Can they try that? http://download.fedora.redhat.com/pub/fedora/linux/core/development/i386/Fedora/RPMS/ has the i386 rpms, analogous directories have the other arches. I can alternatively put on my ftp page.
to clarify, the ia64 rpms are in: http://download.fedora.redhat.com/pub/fedora/linux/core/development/ia64/Fedora/RPMS/
The problem which is Bugzilla #145309(gdb cause SIGSEGV.) doesn't occur now. However, another problem which is "A part of backtrace cannot be referred" is occured. The problem hasn't been solved yet because we still can't debug by using gdb. We keep reporting this problem to Bugzilla#145309 on circumstances. Steps to Reproduce: 1)Compile the attachment test program. $ gcc -lpthread -o thread thread.c If you see the source you will understand easily. It's really simple test program as the thread is generated and sleeps. 2)Execute the program $ ./thread & [1] 13884 3)Core is gathered while operating. $ gcore 13884 4)Refer to core with gdb. $ gdb ./thread core.13884 Actual results: ===== GNU gdb Red Hat Linux (6.3.0.0-0.10rh) (Omittedï¼ Core was generated by `/work/testpro/thread'. (Omittedï¼ warning: svr4_current_sos: Can't read pathname for load map: Input/output error (Omittedï¼ #0 0xa000000000010641 in ?? () (gdb) bt (gdb) bt #0 0xa000000000010641 in ?? () #1 0x20000000001c4420 in __GC___libc_nanosleep () from /lib/tls/libc.so.6.1 #2 0x2000000000302c50 in _IO_wide_data_2 () from /lib/tls/libc.so.6.1 warning: Can't fetch instructions for slot numbers greater than 2. Using slot 0 instead #3 0x0000000000000000 in ?? () (gdb) thread apply all bt Thread 3 (process 13884): #0 0xa000000000010641 in ?? () #1 0x20000000001c4420 in __GC___libc_nanosleep () from /lib/tls/libc.so.6.1 #2 0x2000000000090110 in default_attr () from /lib/tls/libpthread.so.0 Thread 2 (process 13886): #0 0xa000000000010641 in ?? () #1 0x20000000001c4420 in __GC___libc_nanosleep () from /lib/tls/libc.so.6.1 #2 0x2000000000302c50 in _IO_wide_data_2 () from /lib/tls/libc.so.6.1 warning: Can't fetch instructions for slot numbers greater than 2. Using slot 0 instead #3 0x0000000000000000 in ?? () Thread 1 (process 13885): #0 0xa000000000010641 in ?? () #1 0x20000000001c4420 in __GC___libc_nanosleep () from /lib/tls/libc.so.6.1 #2 0x2000000000302c50 in _IO_wide_data_2 () from /lib/tls/libc.so.6.1 warning: Can't fetch instructions for slot numbers greater than 2. Using slot 0 instead #3 0x0000000000000000 in ?? () ===== Expected results: The following results are the results of RHELv4+i386 machine. RHELv4+IPF machine expects all the backtraces to be shown. ===== $ gdb ./thread core.16397 GNU gdb Red Hat Linux (6.3.0.0-0.10rh) Copyright 2004 Free Software Foundation, Inc. (Omittedï¼ #0 0x00d2b15c in __nanosleep_nocancel () from /lib/tls/i686/libc.so.6 (gdb) bt #0 0x00d2b15c in __nanosleep_nocancel () from /lib/tls/i686/libc.so.6 #1 0x00d2af97 in sleep () from /lib/tls/i686/libc.so.6 #2 0x08048592 in threadA () #3 0x00c5e0dd in start_thread () from /lib/tls/i686/libpthread.so.0 #4 0x00d62a2e in clone () from /lib/tls/i686/libc.so.6 (gdb) thread apply all bt Thread 3 (process 16397): #0 0x00d2b15c in __nanosleep_nocancel () from /lib/tls/i686/libc.so.6 #1 0x00d2af97 in sleep () from /lib/tls/i686/libc.so.6 #2 0x080484fe in main () Thread 2 (process 16399): #0 0x00d2b15c in __nanosleep_nocancel () from /lib/tls/i686/libc.so.6 #1 0x00d2af97 in sleep () from /lib/tls/i686/libc.so.6 #2 0x080485d2 in threadB () #3 0x00c5e0dd in start_thread () from /lib/tls/i686/libpthread.so.0 #4 0x00d62a2e in clone () from /lib/tls/i686/libc.so.6 Thread 1 (process 16398): #0 0x00d2b15c in __nanosleep_nocancel () from /lib/tls/i686/libc.so.6 #1 0x00d2af97 in sleep () from /lib/tls/i686/libc.so.6 #2 0x08048592 in threadA () #3 0x00c5e0dd in start_thread () from /lib/tls/i686/libpthread.so.0 #4 0x00d62a2e in clone () from /lib/tls/i686/libc.so.6 ===== Influence: If the backtrace of a simple test program like the test program which we attached cannot be shown, the middleware cannot be debugged. Actually, we cannot debug our middleware at all. Our middleware development for RHELv4 has been delayed.
Steps to Reproduce: 1)Compile the attachment test program. $ gcc -lpthread -o thread thread.c If you see the source you will understand easily. It's really simple test program as the thread is generated and sleeps. 2)Execute the program $ ./thread & [1] 13884 3)Core is gathered while operating. $ gcore 13884 4)Refer to core with gdb. $ gdb ./thread core.13884 cat thread.c ===== #include <errno.h> #include <pthread.h> #include <signal.h> static void *threadA(void *tname); static void *threadB(void *tname); int main() { pthread_t thrdidA ; pthread_t thrdidB ; int ret; void *status; printf( "TEST START\n" ) ; if ((ret=pthread_create(&thrdidA,NULL,threadA,(void *)"THREAD- A"))) { printf( " pthread_create ERROR errno=%d\n", ret ) ; } if ((ret=pthread_create(&thrdidB,NULL,threadB,(void *)"THREAD- B"))) { printf( " pthread_create ERROR errno=%d\n", ret ) ; } sleep(10); if ((ret=pthread_join(thrdidA,&status))) { printf( " pthead_join ERROR errono=%d\n", ret ) ; } if ((ret=pthread_join(thrdidB,&status))) { printf( " pthead_join ERROR errono=%d\n", ret ) ; } printf( "TEST END\n" ) ; } static void *threadA(void *tname) { printf( "%s START\n",(char *)tname ); sleep(10); printf( "%s END\n",(char *)tname ); return(NULL); } static void *threadB(void *tname) { printf( "%s START\n",(char *)tname ); sleep(10); printf( "%s END\n",(char *)tname ); return(NULL); } =====
We tested gdb-6.3.0.0-0.13.ia64.rpm. Our "Expected results" has not been achieved yet. Though it doesn't see the problem easily "Refer to the corefile" A part of the stack cannot be displayed. The stack which "Process is connected with gdb while operating" can be displayed. We expect it can be displayed the all stacks by referring the corefile with gdb. [Steps to Reproduce:] ===== Case1: "Refer to the corefile" 1)Compile the attachment test program. $ gcc -lpthread -o thread thread.c If you see the source you will understand easily. It's really simple test program as the thread is generated and sleeps. 2)Execute the program $ ./thread & [1] 13884 3)Core is gathered while operating. $ gcore 13884 4)Refer to core with gdb. $ gdb ./thread core.13884 GNU gdb Red Hat Linux (6.3.0.0-0.13rh) (gdb) bt #0 0xa000000000010641 in ?? () #1 0x20000000001c4440 in __GC___libc_nanosleep () from /lib/tls/libc.so.6.1 #2 0x2000000000302c50 in _IO_wide_data_2 () from /lib/tls/libc.so.6.1 warning: Can't fetch instructions for slot numbers greater than 2. Using slot 0 instead #3 0x20000000001c4420 in __GC___libc_nanosleep () from /lib/tls/libc.so.6.1 Previous frame inner to this frame (corrupt stack?) (gdb) thread apply all backtrace Thread 3 (process 6704): #0 0xa000000000010641 in ?? () #1 0x20000000001c4440 in __GC___libc_nanosleep () from /lib/tls/libc.so.6.1 #2 0x2000000000090110 in default_attr () from /lib/tls/libpthread.so.0 #3 0x20000000001c4420 in __GC___libc_nanosleep () from /lib/tls/libc.so.6.1 Previous frame inner to this frame (corrupt stack?) Thread 2 (process 6706): #0 0xa000000000010641 in ?? () #1 0x20000000001c4440 in __GC___libc_nanosleep () from /lib/tls/libc.so.6.1 #2 0x2000000000302c50 in _IO_wide_data_2 () from /lib/tls/libc.so.6.1 warning: Can't fetch instructions for slot numbers greater than 2. Using slot 0 instead #3 0x20000000001c4420 in __GC___libc_nanosleep () from /lib/tls/libc.so.6.1 Previous frame inner to this frame (corrupt stack?) Thread 1 (process 6705): #0 0xa000000000010641 in ?? () #1 0x20000000001c4440 in __GC___libc_nanosleep () from /lib/tls/libc.so.6.1 #2 0x2000000000302c50 in _IO_wide_data_2 () from /lib/tls/libc.so.6.1 warning: Can't fetch instructions for slot numbers greater than 2. Using slot 0 instead #3 0x20000000001c4420 in __GC___libc_nanosleep () from /lib/tls/libc.so.6.1 Previous frame inner to this frame (corrupt stack?) ===== [Actual results:] Following error message a),b) and part of the stack cannot be displayed. a)"Previous frame inner to this frame (corrupt stack?)" b)"warning: Can't fetch instructions for slot numbers greater than 2. Using slot 0 instead" [Expected results:] Case2: "Process is connected with gdb while operating" 1)Execute the same program $ ./thread & [1] 14070 2)Connect the process with gdb. $ gdb -p 14070 GNU gdb Red Hat Linux (6.3.0.0-0.13rh) (gdb) bt #0 0xa000000000010641 in ?? () #1 0x20000000001c4440 in __GC___libc_nanosleep () from /lib/tls/libc.so.6.1 #2 0x20000000001c4090 in sleep () from /lib/tls/libc.so.6.1 #3 0x40000000000009e0 in main () (gdb) thread apply all backtrace Thread 3 (Thread 2305843009227420288 (LWP 14071)): #0 0xa000000000010641 in ?? () #1 0x20000000001c4440 in __GC___libc_nanosleep () from /lib/tls/libc.so.6.1 #2 0x20000000001c4090 in sleep () from /lib/tls/libc.so.6.1 #3 0x4000000000000b90 in threadA () #4 0x200000000007d630 in start_thread () from /lib/tls/libpthread.so.0 #5 0x200000000023ef90 in __clone2 () from /lib/tls/libc.so.6.1 Thread 2 (Thread 2305843009237906048 (LWP 14072)): #0 0xa000000000010641 in ?? () #1 0x20000000001c4440 in __GC___libc_nanosleep () from /lib/tls/libc.so.6.1 #2 0x20000000001c4090 in sleep () from /lib/tls/libc.so.6.1 #3 0x4000000000000c40 in threadB () #4 0x200000000007d630 in start_thread () from /lib/tls/libpthread.so.0 #5 0x200000000023ef90 in __clone2 () from /lib/tls/libc.so.6.1 Thread 1 (Thread 2305843009216872448 (LWP 14070)): #0 0xa000000000010641 in ?? () #1 0x20000000001c4440 in __GC___libc_nanosleep () from /lib/tls/libc.so.6.1 #2 0x20000000001c4090 in sleep () from /lib/tls/libc.so.6.1 #3 0x40000000000009e0 in main () "__ clone2" and "ThreadA", etc. are displayed. This is a result of our expectation ! ===== If all stacks cannot be displayed even in "Refer to the corefile", it is insufficient as the gdb debugging function. [Additional info:] You said as follows. "it(gdb-6.3.0.0-0.10) should be in rawhide, because it was built for fedora core4. If that works we'll put in RHEL4-U1." However, it is too late in RHEL4-U1. Because gdb before gdb-6.3.0.0-0.10 doesn't work ! If you are a customer, do you use gdb that can be seen only "Signal handler", and outputs core? We strongly hope for "work fine" gdb to be provided, and to be included in RHEL4-GA. We haven't got a much time by now. Please do it as fast as you can. Regards, Fujitsu Japan Support team Yoneda
Created attachment 110637 [details] test-program
A fix has been built into gdb-6.3.0.0-0.16 A mechanism used by gcore to read larger chunks of storage at a time is not working properly for threads. This fix falls back to use an older, reliable mechanism, that can only read small chunks at a time.
I read #147436 and knew problems of /proc/yyyy/mem and /proc/yyyy/task/xxxx/mem. Then I tested gdb-6.3.0.0-0.16.ia64.rpm. (yyyy is the main process pid and xxxx is the lwp of the thread) [Expected Result] We expect it can be displayed the all stacks by referring the corefile with gdb. I do not worry taking time by using PTRACE. [Actual Result] "Refer to the corefile" A part of the stack cannot be displayed. (Same as gdb-6.3.0.0-0.13) Does gdb-6.3.0.0-0.16 works fine when #147436 is solved ? Or, in gdb-6.3.0.0-0.16, is other problems still? Regards, Fujitsu Japan Support team Yoneda
Problem reproduced. In my testing, I was issuing a "thread 2" command before running gcore. If you issue thread 1 in your test, it should work fine with 0.16. What is happening is that there is still a call to look at the /proc/xxx/mem storage, but this is meant to be only invoked when it sees the main thread of a non-threaded program (i.e. to not hinder the performance of an unthreaded program using gcore). The test is faulty and mistook the initial conditions you have above as being the non-threaded case. Therefore, I have simply removed the call that reads the /proc mem for the time being. I have rebuilt the fix into gdb-6.3.0.0-0.18. When the kernel fix is made I will replace the call again.
I tested with 0.18 gcore. It works very slowly. However, it works fine at my test-program. Thanks, Fujitsu Japan Support team Yoneda
Refer to Issue 66089.
gdb-6.3.0.0-0.31.ia64 has been tested and confirmed to resolve all issues shown above.
*** Bug 150068 has been marked as a duplicate of this bug. ***
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2005-241.html