Description of problem: gdb stack is corrupted when attatching to a running process compiled with -m32 on a DELL PE2850 (x86_64). Things are fine when -m32 is not supplied. Version-Release number of selected component (if applicable): kernel-2.6.12-1.1381_FC3smp gdb-6.1post-1.20040607.43.0.1 glibc-2.3.6-0.fc3.1 gcc-3.4.4-2.fc3 How reproducible: with following test program test.c: #include <stdio.h> main() { int i=0; while(++i) { printf("%d\n", i); sleep(1); } } Steps to Reproduce: 1. gcc -g -m32 test.c 2. ./a.out 3. attatch gdb to the running a.out Actual results: snap@tp105:/usr/snap 220 % gdb - 31414 GNU gdb Red Hat Linux (6.1post-1.20040607.43.0.1rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"...-: No such file or directory. Attaching to process 31414 Reading symbols from /usr/snap/tmp/test/a.out...done. Using host libthread_db library "/lib64/tls/libthread_db.so.1". Reading symbols from /lib/tls/libc.so.6...done. Loaded symbols for /lib/tls/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 0xffffe410 in ?? () (gdb) bt #0 0xffffe410 in ?? () #1 0xffffc208 in ?? () #2 0xf7fceff4 in ?? () from /lib/tls/libc.so.6 #3 0xffffc064 in ?? () #4 0xf7f32590 in __nanosleep_nocancel () from /lib/tls/libc.so.6 #5 0xf7f323bc in sleep () from /lib/tls/libc.so.6 Previous frame inner to this frame (corrupt stack?) (gdb) Expected results: GNU gdb Red Hat Linux (6.1post-1.20040607.43.0.1rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"...-: No such file or directory. Attaching to process 31456 Reading symbols from /usr/snap/tmp/test/a.out...done. Using host libthread_db library "/lib64/tls/libthread_db.so.1". Reading symbols from /lib64/tls/libc.so.6...done. Loaded symbols for /lib64/tls/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 0x00002aaaaac4fd32 in __nanosleep_nocancel () from /lib64/tls/libc.so.6 (gdb) bt #0 0x00002aaaaac4fd32 in __nanosleep_nocancel () from /lib64/tls/libc.so.6 #1 0x00002aaaaac4fbd0 in sleep () from /lib64/tls/libc.so.6 #2 0x0000000000400534 in main () at test.c:10 (gdb) Additional info: The expected result is produced by attatching gdb to a.out which is compiled without -m32.
Does the problem exist in gdb's shipped in later FC releases as well? Could you test the latest rawhide and/or the upcoming FC5?
(In reply to comment #1) > Does the problem exist in gdb's shipped in later FC releases as well? Could you > test the latest rawhide and/or the upcoming FC5? I have tested with gdb-6.3.0.0-1.84.x86_64. The stack is still currupted. However, there is a warning about VSYSCALL page. No idea how to address it. I have also tested with the non-smp version of the kernel-2.6.12-1.1381_FC3. The result is the same. I could not test with gdb shipped with FC5 as it requires glibc-2.4. Following is the result of gdb-6.3.0.0-1.84.x86_64 snap@tp105:/usr/snap/tmp/test 354 % gdb - 4604 GNU gdb Red Hat Linux (6.3.0.0-1.84rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"...-: No such file or directory. Attaching to process 4604 warning: The current VSYSCALL page code requires an existing execuitable. Use "add-symbol-file-from-memory" to load the VSYSCALL page by hand Reading symbols from /usr/snap/tmp/test/a.out...done. Using host libthread_db library "/lib64/tls/libthread_db.so.1". Reading symbols from /lib/tls/libc.so.6...done. Loaded symbols for /lib/tls/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 0xffffe410 in ?? () snap@tp105:/usr/snap/tmp/test 354 % gdb - 4604 GNU gdb Red Hat Linux (6.3.0.0-1.84rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"...-: No such file or directory. Attaching to process 4604 warning: The current VSYSCALL page code requires an existing execuitable. Use "add-symbol-file-from-memory" to load the VSYSCALL page by hand Reading symbols from /usr/snap/tmp/test/a.out...done. Using host libthread_db library "/lib64/tls/libthread_db.so.1". Reading symbols from /lib/tls/libc.so.6...done. Loaded symbols for /lib/tls/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 0xffffe410 in ?? () (gdb) bt #0 0xffffe410 in ?? () #1 0xffffcee8 in ?? () #2 0xf7fceff4 in ?? () from /lib/tls/libc.so.6 #3 0xffffcd44 in ?? () #4 0xf7f32590 in __nanosleep_nocancel () from /lib/tls/libc.so.6 #5 0xf7f323bc in sleep () from /lib/tls/libc.so.6 Previous frame inner to this frame (corrupt stack?) (gdb) help add-symbol-file-from-memory Load the symbols out of memory from a dynamically loaded object file. Give an expression for the address of the file's shared object file header. (gdb)
Created attachment 126256 [details] Test on another x86_64 I cannot reproduce the error you see. I tried your test program on an x86_64 system in a chrooted FC3 environment. See the attachment for the results. A few questions * Are your system's packages updated to the most recent versions? * Are you running in any kind of SElinux environment? If so, have you tried turning it off? * What is the setting of your kernel vdso variable, that is /proc/sys/kernel/vdso? If it's not 0, I am wondering if you # echo 0 >/proc/sys/kernel/vdso if it will make a difference? This might be both a gdb and a kernel problem. There are at least a couple of similar Bugzilla bugs that were worked on for RHEL 4: * Bug 146087 - Can't debug 32 bit apps running on x86_64 * Bug 146803 - 32bit gdb doesn't work on x84_64 Bear in mind that the kernel in question for Bug 146803 is 2.6.9-something...
(In reply to comment #3) Your result doesn't look right either, although not identical to what I see. I would expect somthing showing function name with line numbers for bt, such as (gdb) bt #0 0x00002aaaaac4fd32 in __nanosleep_nocancel () from /lib64/tls/libc.so.6 #1 0x00002aaaaac4fbd0 in sleep () from /lib64/tls/libc.so.6 #2 0x0000000000400534 in main () at test.c:10 (gdb) Regarding to your questions: 1. I have updated all possible components from the most recent FC3 updates, such as kernel, gdb, gcc, glibc. What else do I miss? 2. SElinux is off. 3. There is no /proc/sys/kernel/vdso on my computer. I cannot create one even as root - the file system is not writable. How may I get it defined? A question: This bug looks similar to Bug 146087 and Bug 166083 that seem having been fixed. Why does it still show up in the latest FC3 release?
Regarding your point 3) in comment 4: My bad, I think only the FC2 kernels have /proc/sys/kernel/vdso -- maybe the original FC3 kernels (2.6.9) had it too. But the latest FC3 kernel as you know is 2.6.12, so the kernel developers must have done away with it. It is possible that the kernel portion of this problem was fixed in FC3's upgrade to 2.6.12. But I rather don't believe it was. The excerpt from my log file is *really* running on an RHEL 4 (Centos 4, actually) kernel (kernel-2.6.9-22.0.2.EL), the latest RHEL 4 kernel, which includes fixes for this very problem. It may be the kernel, because I'm not getting the error you get when I run it, and as far as I can tell, we're both using the latest FC3 gdb (x86_64) binaries. The GDB portion? Well, the bug was reported at the time RHEL 4 was a release candidate (Feb. '05), and it looks like it wasn't officially fixed until a few months later, in May. (http://rhn.redhat.com/errata/RHBA-2005-187.html). > Why does it still show up in the latest FC3 release? I don't know. Perhaps because no bug was filed for this problem against FC3 at the time. The Fedora Legacy Project focuses on security problems, so we really don't have a lot of resources to invest in fixing this for FC3. We're up to our ears in fixing security-related problems. So a couple things I might suggest -- (1) you may want to try RHEL 4's gdb to see if that may ameliorate the problem ... if you want to try and build that version of gdb from source ... as it should at any rate have the gdb fixes. It may also be available from an alternative open source supplier in binary form -- but compiling from source on the system you are currently running will guarantee compatibility with your FC3 libraries and other applications. Or (2), see if you can find out if there is a patch available for the 2.6.12 kernel for this issue. The patch mentioned for the kernel in Bug 146803 is for version 2.6.9 kernel (the RHEL 4 kernel), and I don't believe it will apply to the 2.6.12 kernel. If you do find such a patch, please do let us know in this or a new bug report (you can reopen this one if you want)-- for we can put it in the queue for including in FC3's kernel next time we do an update to it for security issues. Or (3), you could try building and installing RHEL 4's kernel-2.6.9- 22.0.2.EL on FC3, and see if that helps. Hope this helps. Sorry we can't do more. I am closing this bug CANTFIX for now, but you are welcome to reopen it if you find something that can help us solve this problem that we haven't found. Thanks.
Created attachment 126290 [details] execution log with kernel-2.6.9-22.0.2.EL and gdb-6.3.0.0-0.30.1 (In reply to comment #5) I got the kernel and gdb as you suggested. kernel-2.6.9-22.0.2.EL gdb-6.3.0.0-0.30.1 However, the problem does not go away - see attatchment. It has the same symptom as BUG 146087, i.e., you get the correct backtrace after a few stepi. I understand you are not fixing FC3 kernel. But PLEASE help find a working kernel/gdb. Thanks!!!
I'd get the latest rhel 4 gdb: http://mirrors.kernel.org/redhat/redhat/linux/updates/enterprise/4AS/en/os/SRPMS/gdb-6.3.0.0-1.96.src.rpm Build it and give it a try.
(In reply to comment #7) Thanks for your information. However, there is no luck! I also have tried CentOS4.3 and the bug is also there. It seems to me that this bug has not been fixed properly. I am going to file a bug report to CentOS. In the meantime, I would appreciate if you could let me know if you hear any news in this regard. Thanks.