Bug 155925 - All coreutils binaries segfault, requiring complete reinstall of linux
Summary: All coreutils binaries segfault, requiring complete reinstall of linux
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: coreutils
Version: 3
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
Assignee: Tim Waugh
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-04-25 19:58 UTC by Neal Rhodes
Modified: 2007-11-30 22:11 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-04-27 18:59:29 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
strace output of basename. (4.22 KB, text/plain)
2005-04-25 20:10 UTC, Neal Rhodes
no flags Details

Description Neal Rhodes 2005-04-25 19:58:01 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.7) Gecko/20050416 Fedora/1.0.3-1.3.1 StumbleUpon/1.9993 Firefox/1.0.3

Description of problem:
This has been repeated 3 times on 3 different servers, 2 FC2, one Advanced Server:  After creating a new database with Progress DB 9.1D09,  SOMETHING as yet unknown is hozed regarding the root user.   The behavior is that ALL binaries in the coreutils package will "Segmentation Fault".  Even BASENAME will segfault before it gets to the "Usage:...." display.   
<P>
Other users can login, and do normal tasks.   However, any attempt by other users to su will fail.  And if the system reboots, since MV, RM, and practically every 
other util used during startup segfaults, that box ain't coming up no more. 
<p>
Prior times when we reboot, it is possible to come up on a boot/rescue CD, but as soon as you chroot to the disk drive, everything starts get faulting again. 
<p>
Since BASENAME is the simplest program that fails, I had hoped to get the sources and recompile it with some printfs to see where in its initialization it is crapping out.  However, after loading the 5.2.1 sources, attempts to "make basename" fail because "localedir.h" is not found, and I cannot seem to figure out where that lives to get the correct package. 
<p>
Fortunately, this is on a test system, so until we lose power I can experiment. 
<p>
I can run Strace on basename with both a normal user and root and post differences.  

Version-Release number of selected component (if applicable):
coreutils-5.2.1

How reproducible:
Always

Steps to Reproduce:
1. Run Prodb command on Progress 9.1D09
2. Run basename, ls, anything 
3.
  

Actual Results:  Segmentation fault; eventually you must reinstall linux from scratch. 

Additional info:

Comment 1 Neal Rhodes 2005-04-25 20:10:25 UTC
Created attachment 113649 [details]
strace output of basename. 

output of strace on basename command.	Seems same for both root and my login.

Comment 2 Tim Waugh 2005-04-26 09:57:30 UTC
What does 'dmesg' say at that point?

Also, what is the output of 'rpm -Va'?

Comment 3 Neal Rhodes 2005-04-26 15:42:56 UTC
uhhhh...
[root@idiot mnop]# dmesg
Segmentation fault
hmmm, maybe you meant....
[root@idiot mnop]# cd /var/log
[root@idiot log]# tail dmesg
IPv6 over IPv4 tunneling driver
EXT3 FS on dm-0, internal journal
cdrom: open failed.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on hda1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
SELinux: initialized (dev hda1, type ext3), uses xattr
SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
Adding 196600k swap on /dev/VolGroup00/LogVol01.  Priority:-1 extents:1
SELinux: initialized (dev binfmt_misc, type binfmt_misc), uses genfs_contexts

REgarding revisions, I setup yum on this machine and did a yum update before 
running the Progress job that broke it, so in theory it had all FC3 updates. 

[root@idiot log]# rpm -Va
..5....T. c /etc/issue
.......T. c /etc/yum.repos.d/fedora-devel.repo
.......T. c /etc/yum.repos.d/fedora-updates-testing.repo
S.5....T. c /etc/yum.repos.d/fedora-updates.repo
S.5....T. c /etc/yum.repos.d/fedora.repo
prelink: /bin/basename: ELF headers changed since prelinking
S.?....T.   /bin/basename
prelink: /bin/cat: ELF headers changed since prelinking
S.?....T.   /bin/cat
prelink: /bin/chgrp: ELF headers changed since prelinking
S.?....T.   /bin/chgrp
prelink: /bin/chmod: ELF headers changed since prelinking
S.?....T.   /bin/chmod
prelink: /bin/cut: ELF headers changed since prelinking
S.?....T.   /bin/cut
prelink: /bin/dd: ELF headers changed since prelinking
S.?....T.   /bin/dd
prelink: /bin/df: ELF headers changed since prelinking
S.?....T.   /bin/df
prelink: /bin/echo: ELF headers changed since prelinking
S.?....T.   /bin/echo
prelink: /bin/false: ELF headers changed since prelinking
S.?....T.   /bin/false
prelink: /bin/link: ELF headers changed since prelinking
S.?....T.   /bin/link
prelink: /bin/ln: ELF headers changed since prelinking
S.?....T.   /bin/ln
prelink: /bin/ls: ELF headers changed since prelinking
S.?....T.   /bin/ls
prelink: /bin/mknod: ELF headers changed since prelinking
S.?....T.   /bin/mknod
prelink: /bin/nice: ELF headers changed since prelinking
S.?....T.   /bin/nice
prelink: /bin/rm: ELF headers changed since prelinking
S.?....T.   /bin/rm
prelink: /bin/sleep: ELF headers changed since prelinking
S.?....T.   /bin/sleep
prelink: /bin/sync: ELF headers changed since prelinking
S.?....T.   /bin/sync
prelink: /bin/true: ELF headers changed since prelinking
S.?....T.   /bin/true
prelink: /bin/uname: ELF headers changed since prelinking
S.?....T.   /bin/uname
prelink: /bin/unlink: ELF headers changed since prelinking
S.?....T.   /bin/unlink
SM5....T. c /etc/sysconfig/rhn/up2date
S.5....TC c /etc/sysconfig/rhn/up2date-uuid
.......T. c /etc/yp.conf
S.5....T. c /var/lib/games/mahjongg.easy.scores
S.5....T. c /etc/pam.d/system-auth
missing     /usr/share/system-config-date/Clock.pyc
missing     /usr/share/system-config-date/dateBackend.pyc
missing     /usr/share/system-config-date/date_gui.pyc
missing     /usr/share/system-config-date/mainWindow.pyc
missing     /usr/share/system-config-date/system-config-date.pyc
missing     /usr/share/system-config-date/timeconfig.pyc
missing     /usr/share/system-config-date/timezoneBackend.pyc
missing     /usr/share/system-config-date/timezone_gui.pyc
missing     /usr/share/system-config-date/timezone_map_gui.pyc
missing     /usr/share/system-config-date/zonetab.pyc
........?   /var/lib/nfs/rpc_pipefs
prelink: /bin/gunzip: ELF headers changed since prelinking
S.?....T.   /bin/gunzip
prelink: /bin/gzip: ELF headers changed since prelinking
S.?....T.   /bin/gzip
prelink: /bin/zcat: ELF headers changed since prelinking
S.?....T.   /bin/zcat
prelink: /bin/kbd_mode: ELF headers changed since prelinking
S.?....T.   /bin/kbd_mode
prelink: /bin/loadkeys: ELF headers changed since prelinking
S.?....T.   /bin/loadkeys
S.5....T. c /etc/ldap.conf
........C   /var/lib/scrollkeeper


it seems to be chugging away still, so I'll post the rest later when/if it
finishes, but I don't want to lose this much. 

BTW, if I could get some clues on the missing include files in coreutils 
package, I could compile another basename on my still working FC3 system and 
put in some printfs.  Based on what I see, all the coreutils are dying in 
this section of code....

  initialize_main (&argc, &argv);
  program_name = argv[0];
  setlocale (LC_ALL, "");
  bindtextdomain (PACKAGE, LOCALEDIR);
  textdomain (PACKAGE);
  atexit (close_stdout);

If I could even get a copy of basename with debugging info in it I could run it
via GDB and get some clues on where the standard libraries are croaking. 



Comment 4 Neal Rhodes 2005-04-26 16:06:07 UTC
Now this is interesting....
[neal@idiot bin]$ ls -lt /bin | more
total 6552
-rwxr-xr-x  1 root root  11549 Apr 25 13:17 arch
-rwxr-xr-x  1 root root  19937 Apr 25 13:17 aumix-minimal
-rwxr-xr-x  1 root root  22417 Apr 25 13:17 basename
-rwxr-xr-x  1 root root 623285 Apr 25 13:17 bash
-rwxr-xr-x  1 root root  26113 Apr 25 13:17 cat
-rwxr-xr-x  1 root root  42177 Apr 25 13:17 chgrp
-rwxr-xr-x  1 root root  41773 Apr 25 13:17 chmod
-rwxr-xr-x  1 root root  63437 Apr 25 13:17 cpio
-rwxr-xr-x  1 root root  36445 Apr 25 13:17 cut
-rwxr-xr-x  1 root root  39069 Apr 25 13:17 dd
-rwxr-xr-x  1 root root  42245 Apr 25 13:17 df
-rwxr-xr-x  1 root root  13393 Apr 25 13:17 dmesg
-rwxr-xr-x  1 root root  11405 Apr 25 13:17 doexec
-rwxr-xr-x  1 root root  23765 Apr 25 13:17 echo
-rwxr-xr-x  1 root root  54225 Apr 25 13:17 ed
-rwxr-xr-x  1 root root  21349 Apr 25 13:17 false
-rwxr-xr-x  1 root root 259385 Apr 25 13:17 gawk
-rwxr-xr-x  1 root root  17525 Apr 25 13:17 gettext
-rwxr-xr-x  1 root root  17777 Apr 25 13:17 hostname
-rwxr-xr-x  1 root root  14105 Apr 25 13:17 kbd_mode
-rwxr-xr-x  1 root root  16869 Apr 25 13:17 kill
-rwxr-xr-x  1 root root 190465 Apr 25 13:17 ksh
-rwxr-xr-x  1 root root  22413 Apr 25 13:17 link
-rwxr-xr-x  1 root root  86429 Apr 25 13:17 loadkeys
-rwxr-xr-x  1 root root  29661 Apr 25 13:17 login
-rwxr-xr-x  1 root root  92205 Apr 25 13:17 ls
-rwxr-xr-x  1 root mail  80561 Apr 25 13:17 mail
-rwxr-xr-x  1 root root  30045 Apr 25 13:17 mknod
-r-xr-xr-x  1 root root  15945 Apr 25 13:17 mktemp
-rwxr-xr-x  1 root root  39469 Apr 25 13:17 more
-rwxr-xr-x  1 root root  22037 Apr 25 13:17 mt
-rwxr-xr-x  1 root root  96813 Apr 25 13:17 netstat
-rwxr-xr-x  1 root root  25273 Apr 25 13:17 nice
-rwxr-xr-x  1 root root 260121 Apr 25 13:17 pgawk
-r-xr-xr-x  1 root root  73409 Apr 25 13:17 ps
-rwxr-xr-x  1 root root  41893 Apr 25 13:17 rm
-rwxr-xr-x  1 root root  52113 Apr 25 13:17 sed

Yesterday at 13:17 is about when I ran the Progress DB create, and indeed the
relevant binaries don't have the same checksum as on my other FC3 system.  I'm
copying them over now, and shall see if root can run them instead.  It's still
doing that rpm report now. 

Since noone but me has access to this machine, the question of how those files
changed is certainly interesting. 


Comment 5 Tim Waugh 2005-04-26 16:21:43 UTC
Whatever set of coreutils binaries you are running, they are not the ones that
came from the RPM package, and neither are they the prelinked versions of the
shipped binaries.

Comment 6 Neal Rhodes 2005-04-26 16:35:23 UTC
so, it still looks like the most direct way out of the swamp is to copy /bin
from my other yum-updated FC3 system into somewhere on this one, and then see if
I can use it's cp command to cp that to /bin.   (remember my cp is broke also,
along with rpm. )

Assuming that gets it somewhat functioning, is there a way to ask "yum" to
forcefully reapply the coreutils? 

Then I go back to the question of how a commercial DB product could whack that
directory building a new database.  And, how that whacking could work for
everyone BUT root.   I can make more careful notes as we experiment on doing it
again.  (maybe even avoid running it as root, eh? )



Comment 7 Neal Rhodes 2005-04-26 21:05:47 UTC
With all due respect, I can see waving off this behavior as just another idiot
linux user.   However, we are still left with an apparently innocent behavior
that makes FC3 innoperable, and after crashing 4 servers in the last 3 months
and having to reload each time we are no closer to understanding "What the Heck
Happened?".   And once I lose power on this server we'll no longer be able to
investigate this until the next time. 

My hunch is there is some dubious code in one of the shared libraries - basename
only uses one though.   Why it only affects the root user is a real puzzler. 

After copying over /bin from another current FC3 system we still have the same
behavior - root cannot run any coreutils although other users can just fine. 

rpm -Va now shows...
..5....T. c /etc/issue
.......T. c /etc/yum.repos.d/fedora-devel.repo
.......T. c /etc/yum.repos.d/fedora-updates-testing.repo
S.5....T. c /etc/yum.repos.d/fedora-updates.repo
S.5....T. c /etc/yum.repos.d/fedora.repo
prelink: /bin/basename: ELF headers changed since prelinking
S.?....T.   /bin/basename
prelink: /bin/cat: at least one of file's dependencies has changed since prelinking
S.?....T.   /bin/cat
prelink: /bin/chgrp: at least one of file's dependencies has changed since
prelinking
S.?....T.   /bin/chgrp
prelink: /bin/chmod: at least one of file's dependencies has changed since
prelinking
S.?....T.   /bin/chmod
prelink: /bin/chown: at least one of file's dependencies has changed since
prelinking

taking a wild guess that this might be a library, using what we gleaned from
ldd, we  copied /lib/tls/* from other working FC3 system,  but no better, no
worse, and we still get same messages. 

I've apparently got Yum working.   So I've got some tools if I could force yum
to update.  

Comment 8 Tim Waugh 2005-04-26 21:41:30 UTC
Well, what's LD_PRELOAD/LD_LIBRARY_PATH for root?

Comment 9 Neal Rhodes 2005-04-26 21:59:14 UTC
ok, I'm really trying not to be dense here.  I have used xenix/unix/aix for 20
years, but you've got me stumped.  

neither env or set show either of the above variables, so I don't know how to
answer that.   They are not set for non-root users either, so that would be
identical. 

A quick google search indicates maybe my hunch on what you are asking is right. 


Comment 10 Neal Rhodes 2005-04-26 23:33:57 UTC
Ok, someone gave me a clue on recompiling coreutils, and I modified basename.c
thusly:

main (int argc, char **argv)
{
  char *name;

  printf("1"); fflush(stdout);
  initialize_main (&argc, &argv);
  printf("2"); fflush(stdout);
  program_name = argv[0];
  printf("3"); fflush(stdout);
  setlocale (LC_ALL, "");
  printf("4"); fflush(stdout);
  bindtextdomain (PACKAGE, LOCALEDIR);
  printf("5"); fflush(stdout);
  textdomain (PACKAGE);
  printf("6"); fflush(stdout);

  atexit (close_stdout);
  printf("7"); fflush(stdout);

Compiled, sent over to the crippled box, and ran it as normal user:

[neal@idiot ~]$ ./basename
1234567./basename: too few arguments
Try `./basename --help' for more information.

as you would expect, and as root:
[root@idiot neal]# ./basename
Segmentation fault
[root@idiot neal]#            [root@idiot neal]# gdb ./basename
GNU gdb Red Hat Linux (6.1post-1.20040607.43rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db
library "/lib/tls/i586/libthread_db.so.1".

(gdb) start
Breakpoint 1 at 0x8048c59: file basename.c, line 91.
Starting program: /home/neal/basename

[6]+  Stopped                 gdb ./basename
[root@idiot neal]#

which is exactly what I got trying it as normal user.  

I surmise from the printf's that it is crapping out BEFORE even getting to the
Main() section, somewhere in the C initialization.   Which puts it out of my
league to debug. 

I tried running from gdb: 



Comment 11 Neal Rhodes 2005-04-27 18:59:29 UTC
For some reason I decided to run clamscan and found that one of the Progress
binaries had the Linux.Rst.A virus in it, as did most of /bin. 

I'm gradually correcting and scanning.  Fortunately, rpm works so I can install
clamav on the problem server. 


Note You need to log in before you can comment on or make changes to this bug.