From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.7) Gecko/20050416 Fedora/1.0.3-1.3.1 StumbleUpon/1.9993 Firefox/1.0.3 Description of problem: This has been repeated 3 times on 3 different servers, 2 FC2, one Advanced Server: After creating a new database with Progress DB 9.1D09, SOMETHING as yet unknown is hozed regarding the root user. The behavior is that ALL binaries in the coreutils package will "Segmentation Fault". Even BASENAME will segfault before it gets to the "Usage:...." display. <P> Other users can login, and do normal tasks. However, any attempt by other users to su will fail. And if the system reboots, since MV, RM, and practically every other util used during startup segfaults, that box ain't coming up no more. <p> Prior times when we reboot, it is possible to come up on a boot/rescue CD, but as soon as you chroot to the disk drive, everything starts get faulting again. <p> Since BASENAME is the simplest program that fails, I had hoped to get the sources and recompile it with some printfs to see where in its initialization it is crapping out. However, after loading the 5.2.1 sources, attempts to "make basename" fail because "localedir.h" is not found, and I cannot seem to figure out where that lives to get the correct package. <p> Fortunately, this is on a test system, so until we lose power I can experiment. <p> I can run Strace on basename with both a normal user and root and post differences. Version-Release number of selected component (if applicable): coreutils-5.2.1 How reproducible: Always Steps to Reproduce: 1. Run Prodb command on Progress 9.1D09 2. Run basename, ls, anything 3. Actual Results: Segmentation fault; eventually you must reinstall linux from scratch. Additional info:
Created attachment 113649 [details] strace output of basename. output of strace on basename command. Seems same for both root and my login.
What does 'dmesg' say at that point? Also, what is the output of 'rpm -Va'?
uhhhh... [root@idiot mnop]# dmesg Segmentation fault hmmm, maybe you meant.... [root@idiot mnop]# cd /var/log [root@idiot log]# tail dmesg IPv6 over IPv4 tunneling driver EXT3 FS on dm-0, internal journal cdrom: open failed. kjournald starting. Commit interval 5 seconds EXT3 FS on hda1, internal journal EXT3-fs: mounted filesystem with ordered data mode. SELinux: initialized (dev hda1, type ext3), uses xattr SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs Adding 196600k swap on /dev/VolGroup00/LogVol01. Priority:-1 extents:1 SELinux: initialized (dev binfmt_misc, type binfmt_misc), uses genfs_contexts REgarding revisions, I setup yum on this machine and did a yum update before running the Progress job that broke it, so in theory it had all FC3 updates. [root@idiot log]# rpm -Va ..5....T. c /etc/issue .......T. c /etc/yum.repos.d/fedora-devel.repo .......T. c /etc/yum.repos.d/fedora-updates-testing.repo S.5....T. c /etc/yum.repos.d/fedora-updates.repo S.5....T. c /etc/yum.repos.d/fedora.repo prelink: /bin/basename: ELF headers changed since prelinking S.?....T. /bin/basename prelink: /bin/cat: ELF headers changed since prelinking S.?....T. /bin/cat prelink: /bin/chgrp: ELF headers changed since prelinking S.?....T. /bin/chgrp prelink: /bin/chmod: ELF headers changed since prelinking S.?....T. /bin/chmod prelink: /bin/cut: ELF headers changed since prelinking S.?....T. /bin/cut prelink: /bin/dd: ELF headers changed since prelinking S.?....T. /bin/dd prelink: /bin/df: ELF headers changed since prelinking S.?....T. /bin/df prelink: /bin/echo: ELF headers changed since prelinking S.?....T. /bin/echo prelink: /bin/false: ELF headers changed since prelinking S.?....T. /bin/false prelink: /bin/link: ELF headers changed since prelinking S.?....T. /bin/link prelink: /bin/ln: ELF headers changed since prelinking S.?....T. /bin/ln prelink: /bin/ls: ELF headers changed since prelinking S.?....T. /bin/ls prelink: /bin/mknod: ELF headers changed since prelinking S.?....T. /bin/mknod prelink: /bin/nice: ELF headers changed since prelinking S.?....T. /bin/nice prelink: /bin/rm: ELF headers changed since prelinking S.?....T. /bin/rm prelink: /bin/sleep: ELF headers changed since prelinking S.?....T. /bin/sleep prelink: /bin/sync: ELF headers changed since prelinking S.?....T. /bin/sync prelink: /bin/true: ELF headers changed since prelinking S.?....T. /bin/true prelink: /bin/uname: ELF headers changed since prelinking S.?....T. /bin/uname prelink: /bin/unlink: ELF headers changed since prelinking S.?....T. /bin/unlink SM5....T. c /etc/sysconfig/rhn/up2date S.5....TC c /etc/sysconfig/rhn/up2date-uuid .......T. c /etc/yp.conf S.5....T. c /var/lib/games/mahjongg.easy.scores S.5....T. c /etc/pam.d/system-auth missing /usr/share/system-config-date/Clock.pyc missing /usr/share/system-config-date/dateBackend.pyc missing /usr/share/system-config-date/date_gui.pyc missing /usr/share/system-config-date/mainWindow.pyc missing /usr/share/system-config-date/system-config-date.pyc missing /usr/share/system-config-date/timeconfig.pyc missing /usr/share/system-config-date/timezoneBackend.pyc missing /usr/share/system-config-date/timezone_gui.pyc missing /usr/share/system-config-date/timezone_map_gui.pyc missing /usr/share/system-config-date/zonetab.pyc ........? /var/lib/nfs/rpc_pipefs prelink: /bin/gunzip: ELF headers changed since prelinking S.?....T. /bin/gunzip prelink: /bin/gzip: ELF headers changed since prelinking S.?....T. /bin/gzip prelink: /bin/zcat: ELF headers changed since prelinking S.?....T. /bin/zcat prelink: /bin/kbd_mode: ELF headers changed since prelinking S.?....T. /bin/kbd_mode prelink: /bin/loadkeys: ELF headers changed since prelinking S.?....T. /bin/loadkeys S.5....T. c /etc/ldap.conf ........C /var/lib/scrollkeeper it seems to be chugging away still, so I'll post the rest later when/if it finishes, but I don't want to lose this much. BTW, if I could get some clues on the missing include files in coreutils package, I could compile another basename on my still working FC3 system and put in some printfs. Based on what I see, all the coreutils are dying in this section of code.... initialize_main (&argc, &argv); program_name = argv[0]; setlocale (LC_ALL, ""); bindtextdomain (PACKAGE, LOCALEDIR); textdomain (PACKAGE); atexit (close_stdout); If I could even get a copy of basename with debugging info in it I could run it via GDB and get some clues on where the standard libraries are croaking.
Now this is interesting.... [neal@idiot bin]$ ls -lt /bin | more total 6552 -rwxr-xr-x 1 root root 11549 Apr 25 13:17 arch -rwxr-xr-x 1 root root 19937 Apr 25 13:17 aumix-minimal -rwxr-xr-x 1 root root 22417 Apr 25 13:17 basename -rwxr-xr-x 1 root root 623285 Apr 25 13:17 bash -rwxr-xr-x 1 root root 26113 Apr 25 13:17 cat -rwxr-xr-x 1 root root 42177 Apr 25 13:17 chgrp -rwxr-xr-x 1 root root 41773 Apr 25 13:17 chmod -rwxr-xr-x 1 root root 63437 Apr 25 13:17 cpio -rwxr-xr-x 1 root root 36445 Apr 25 13:17 cut -rwxr-xr-x 1 root root 39069 Apr 25 13:17 dd -rwxr-xr-x 1 root root 42245 Apr 25 13:17 df -rwxr-xr-x 1 root root 13393 Apr 25 13:17 dmesg -rwxr-xr-x 1 root root 11405 Apr 25 13:17 doexec -rwxr-xr-x 1 root root 23765 Apr 25 13:17 echo -rwxr-xr-x 1 root root 54225 Apr 25 13:17 ed -rwxr-xr-x 1 root root 21349 Apr 25 13:17 false -rwxr-xr-x 1 root root 259385 Apr 25 13:17 gawk -rwxr-xr-x 1 root root 17525 Apr 25 13:17 gettext -rwxr-xr-x 1 root root 17777 Apr 25 13:17 hostname -rwxr-xr-x 1 root root 14105 Apr 25 13:17 kbd_mode -rwxr-xr-x 1 root root 16869 Apr 25 13:17 kill -rwxr-xr-x 1 root root 190465 Apr 25 13:17 ksh -rwxr-xr-x 1 root root 22413 Apr 25 13:17 link -rwxr-xr-x 1 root root 86429 Apr 25 13:17 loadkeys -rwxr-xr-x 1 root root 29661 Apr 25 13:17 login -rwxr-xr-x 1 root root 92205 Apr 25 13:17 ls -rwxr-xr-x 1 root mail 80561 Apr 25 13:17 mail -rwxr-xr-x 1 root root 30045 Apr 25 13:17 mknod -r-xr-xr-x 1 root root 15945 Apr 25 13:17 mktemp -rwxr-xr-x 1 root root 39469 Apr 25 13:17 more -rwxr-xr-x 1 root root 22037 Apr 25 13:17 mt -rwxr-xr-x 1 root root 96813 Apr 25 13:17 netstat -rwxr-xr-x 1 root root 25273 Apr 25 13:17 nice -rwxr-xr-x 1 root root 260121 Apr 25 13:17 pgawk -r-xr-xr-x 1 root root 73409 Apr 25 13:17 ps -rwxr-xr-x 1 root root 41893 Apr 25 13:17 rm -rwxr-xr-x 1 root root 52113 Apr 25 13:17 sed Yesterday at 13:17 is about when I ran the Progress DB create, and indeed the relevant binaries don't have the same checksum as on my other FC3 system. I'm copying them over now, and shall see if root can run them instead. It's still doing that rpm report now. Since noone but me has access to this machine, the question of how those files changed is certainly interesting.
Whatever set of coreutils binaries you are running, they are not the ones that came from the RPM package, and neither are they the prelinked versions of the shipped binaries.
so, it still looks like the most direct way out of the swamp is to copy /bin from my other yum-updated FC3 system into somewhere on this one, and then see if I can use it's cp command to cp that to /bin. (remember my cp is broke also, along with rpm. ) Assuming that gets it somewhat functioning, is there a way to ask "yum" to forcefully reapply the coreutils? Then I go back to the question of how a commercial DB product could whack that directory building a new database. And, how that whacking could work for everyone BUT root. I can make more careful notes as we experiment on doing it again. (maybe even avoid running it as root, eh? )
With all due respect, I can see waving off this behavior as just another idiot linux user. However, we are still left with an apparently innocent behavior that makes FC3 innoperable, and after crashing 4 servers in the last 3 months and having to reload each time we are no closer to understanding "What the Heck Happened?". And once I lose power on this server we'll no longer be able to investigate this until the next time. My hunch is there is some dubious code in one of the shared libraries - basename only uses one though. Why it only affects the root user is a real puzzler. After copying over /bin from another current FC3 system we still have the same behavior - root cannot run any coreutils although other users can just fine. rpm -Va now shows... ..5....T. c /etc/issue .......T. c /etc/yum.repos.d/fedora-devel.repo .......T. c /etc/yum.repos.d/fedora-updates-testing.repo S.5....T. c /etc/yum.repos.d/fedora-updates.repo S.5....T. c /etc/yum.repos.d/fedora.repo prelink: /bin/basename: ELF headers changed since prelinking S.?....T. /bin/basename prelink: /bin/cat: at least one of file's dependencies has changed since prelinking S.?....T. /bin/cat prelink: /bin/chgrp: at least one of file's dependencies has changed since prelinking S.?....T. /bin/chgrp prelink: /bin/chmod: at least one of file's dependencies has changed since prelinking S.?....T. /bin/chmod prelink: /bin/chown: at least one of file's dependencies has changed since prelinking taking a wild guess that this might be a library, using what we gleaned from ldd, we copied /lib/tls/* from other working FC3 system, but no better, no worse, and we still get same messages. I've apparently got Yum working. So I've got some tools if I could force yum to update.
Well, what's LD_PRELOAD/LD_LIBRARY_PATH for root?
ok, I'm really trying not to be dense here. I have used xenix/unix/aix for 20 years, but you've got me stumped. neither env or set show either of the above variables, so I don't know how to answer that. They are not set for non-root users either, so that would be identical. A quick google search indicates maybe my hunch on what you are asking is right.
Ok, someone gave me a clue on recompiling coreutils, and I modified basename.c thusly: main (int argc, char **argv) { char *name; printf("1"); fflush(stdout); initialize_main (&argc, &argv); printf("2"); fflush(stdout); program_name = argv[0]; printf("3"); fflush(stdout); setlocale (LC_ALL, ""); printf("4"); fflush(stdout); bindtextdomain (PACKAGE, LOCALEDIR); printf("5"); fflush(stdout); textdomain (PACKAGE); printf("6"); fflush(stdout); atexit (close_stdout); printf("7"); fflush(stdout); Compiled, sent over to the crippled box, and ran it as normal user: [neal@idiot ~]$ ./basename 1234567./basename: too few arguments Try `./basename --help' for more information. as you would expect, and as root: [root@idiot neal]# ./basename Segmentation fault [root@idiot neal]# [root@idiot neal]# gdb ./basename GNU gdb Red Hat Linux (6.1post-1.20040607.43rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/tls/i586/libthread_db.so.1". (gdb) start Breakpoint 1 at 0x8048c59: file basename.c, line 91. Starting program: /home/neal/basename [6]+ Stopped gdb ./basename [root@idiot neal]# which is exactly what I got trying it as normal user. I surmise from the printf's that it is crapping out BEFORE even getting to the Main() section, somewhere in the C initialization. Which puts it out of my league to debug. I tried running from gdb:
For some reason I decided to run clamscan and found that one of the Progress binaries had the Linux.Rst.A virus in it, as did most of /bin. I'm gradually correcting and scanning. Fortunately, rpm works so I can install clamav on the problem server.