Description of problem: After installing rh9 (mediacheck ok!) over previously working rh8, most applications misbehave. These include incorrect error messages (for example "awk '{print $1}'" says "cannot open file [garbage]"), crashes (the stacktrace for "ls" and "awk" is shown below). I'm not sure if it worked on the installation stage, though, when I try to reinstall it fails miserably even to create a boot disk (also due to AWK errors). I'm not sure which programs also fail to work, mount is the only one I now remember. But probably, the question should be "which programs actually *work*". Among these was "vim". I suspect it's glibc problem, because similar symptomps appear in most programs which I could run. I've tried disabling NPTL with LD_KERNEL_SOMETHING (as described in release notes), and with kernel command line. both gave zero effect. How reproducible: Crashes happen constantly and error messages and [garbage] are all same every time. I have too few resources to reproduce it in global (like installing rh8 anew, then rh9), and it might be not reproducible due to my custom settings. Actual results: ls says "Segmentation violation" awk fails to work Expected results: ls should output directory listing awk should work Additional info: The stack trace: ls: GNU gdb Red Hat Linux (5.3post-0.20021129.18rh) Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"... (no debugging symbols found)... (gdb) Starting program: /bin/ls (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)... Program received signal SIGSEGV, Segmentation fault. 0x51ffffff in ?? () (gdb) #0 0x51ffffff in ?? () #1 0x08052fb7 in strcpy () #2 0x0804d4c8 in strcpy () #3 0x0804d7a1 in strcpy () #4 0x0804d8f1 in strcpy () #5 0x0804cb9f in strcpy () #6 0x0804b722 in strcpy () #7 0x08049f75 in strcpy () #8 0x42015574 in __libc_start_main () from /lib/tls/libc.so.6 (gdb) awk (/tmp/awk_cmdl contains arbitrary awk commands": GNU gdb Red Hat Linux (5.3post-0.20021129.18rh) Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"... (no debugging symbols found)... (gdb) Starting program: /bin/awk -f /tmp/awk_cmdl /etc/fstab (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... Program received signal SIGSEGV, Segmentation fault. 0x76093cd0 in ?? () (gdb) #0 0x76093cd0 in ?? () #1 0x0806591d in do_input () #2 0x0806a4dd in main () #3 0x42015574 in __libc_start_main () from /lib/tls/libc.so.6 (gdb)
If it would help, here is mount backtrace both with and without NPTL: Script started on Sun 13 Apr 2003 04:16:48 PM EEST sh-2.05b# gdb mount <<< "r bt" GNU gdb Red Hat Linux (5.3post-0.20021129.18rh) Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"... (no debugging symbols found)... (gdb) Starting program: /bin/mount (no debugging symbols found)...(no debugging symbols found)... Program received signal SIGSEGV, Segmentation fault. 0x4207a42b in strlen () from /lib/tls/libc.so.6 (gdb) #0 0x4207a42b in strlen () from /lib/tls/libc.so.6 #1 0x08051998 in error () #2 0x0804cb21 in strcpy () #3 0x42015574 in __libc_start_main () from /lib/tls/libc.so.6 (gdb) sh-2.05b# exit Script done on Sun 13 Apr 2003 04:18:53 PM EEST Script started on Sun 13 Apr 2003 04:21:06 PM EEST sh-2.05b# export LD_ASSUME_KERNEL=2.4.1 sh-2. 05b# gdb mount <<< "r > bt" GNU gdb Red Hat Linux (5.3post-0.20021129.18rh) Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"... (no debugging symbols found)... (gdb) Starting program: /bin/mount (no debugging symbols found)...(no debugging symbols found)... Program received signal SIGSEGV, Segmentation fault. 0x4009118b in strlen () from /lib/i686/libc.so.6 (gdb) #0 0x4009118b in strlen () from /lib/i686/libc.so.6 #1 0x08051998 in error () #2 0x0804cb21 in strcpy () #3 0x4002c8d7 in __libc_start_main () from /lib/i686/libc.so.6 (gdb) sh-2.05b# exit Script done on Sun 13 Apr 2003 04:21:47 PM EEST
If this would be a generic problem nothing would work at all. Did you reinstall the entire system, or did you update? If the latter, try a fresh install first. There is no way for me to guess what the problem is since the backtraces are useless. Try to do some debugging yourself. Without more information I will have to close the bug as WORKSFORME.
it was upgrade, indeed. i'm not actually interested in fresh install, but i could try some debugging. could you suggest some starting points for the debigging? how can i trace the cause of sigsegvs. it sounds like memory corruption (?), notice the strange awk behavior. what ither information can me useful? probably debuginfo could help although i'm not sure where can i find debuginfo for glibc and awk. waiting for comments.
There is not much I can say. First try to find the exact call path. Does the program reach main()? If yes, where does it really stop (strcpy called from __libc_start_main cannot be right, that function isn't called in __libc_start_main). Single step if necessary on asm level to the place where it crashes.
If I was debugging this, I would be checking the integrity of the glibc packages, assuming the necessary programs don't segfault, by running rpm -qa | grep glibc # to check for multiple packages, and that the # glibc and glibc-common versions are the same. rpm -V glibc glibc-common mount # Do the glibc and mount files match the # checksums from the packages ? ldd /bin/mount # Are we getting the libc library from the right place? Also if you think there may be memory problems, look at the memtest86 utility available from http://www.memtest86.com/ to check for problems.
i've got it up and running [up and crashing ;-)] in vmware now i shall try to debug it a bit further. rpm -qa|grep glibc shows 2.3.2-27.9 both plain glibc anbd glibc-common. rpm -V mount gawk gives the solution! both mount and awk executables fail the md5sum. it seems like the cause of the problem. probably I'll check it later and i will report it here i guess now you can close the bug.
User bug, broken binaries.