80628 – After a few days, system won't boot - unable to mount /proc

Bug 80628 - After a few days, system won't boot - unable to mount /proc

Summary: After a few days, system won't boot - unable to mount /proc

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	8.0
Hardware:	athlon
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Arjan van de Ven
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2002-12-29 02:11 UTC by Mike McMullen
Modified:	2005-10-31 22:00 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2004-09-30 15:40:20 UTC
Embargoed:

Attachments	(Terms of Use)

Description Mike McMullen 2002-12-29 02:11:54 UTC

Description of problem:

Hi All,
 
Any help with the problem below greatly appreciated!
 
I am attempting to install RH 8.0 on a development server. I first tried
to install it on an AMD Duron system running at 900Mhz with 512MB
pf PC133 memory and two 60GB ATA drives.
 
I configured a two disk mirror using ext3 and journaling with /boot as one
partition and / as another partition. I installed this as a server but also 
installed
all packages on the 3 CDs.
 
The installation seemed to go fine. I then used up2date to download the latest
patches/errata to the system. This too seemed to go fine. I rebooted the system
a few days later after moving the server and saw segmentation faults on system
commands such as "more". For example you could "cat /etc/termcap" but 
"more /etc/termcap" gives a segmentation error. The difference between the two
commands being that cat is static linked and more is dynamic.
 
I thought perhaps I might have bad memory so I halted the system, replaced 
memory,
and tried to reboot. This time however the system would not come up at all, 
claiming
it was unable to mount /proc and then went off into never never land spitting
unreadable messages out until I hit reset. The system would never boot again.
 
I double checked that the memory was fine. I thought perhaps I had a flakey 
motherboard so I did the exact same installation on a 1GHZ Duron with another
512MB and two different ATA drives. Long story short, same results on this
server as the 1rst after a few days.
 
I did some searching around in Bugzilla and saw a data corruption issue with
ext3. While it's not exactly my configuration ( I used Disk Druid to create ext3
partitioans and then build a RAID device from them) I thought it might be
related. Sooooo....
 
I have a third server, an AMD 1.2GHz Athalon with different 512MB PC 133 memory
in it and a single 80GB ATA drive. I used ext2 filesystems and created a /boot 
and
a "/" partition. After installation, I installed all the patches/errata RPMs. 
So far it 
reboots after 3 days, however, I am starting to notice a lot of defunct 
processes left
in the zombie state.
 
Has anyone seen this type of behavior and if so is there any remedy? 
 
TIA,
 
Mike

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
    
Actual results:


Expected results:


Additional info:

Comment 1 damonp 2003-05-21 16:38:08 UTC

I have this same problem on RH9 Athlon XP (no RAID) (brand new hw).

mounting proc /etc/rc.sysinit:line 90: 39 segmentation fault LC_ALL = C grep -
q /initrd /proc/mount

If I go in to rescue and chroot, grep (and most other) cmd line utils segfault 
same.  I've installed twice (swapping hd's in between tries).

While setting up, machine was booted and rebooted at least 7 or 8 times 
configuring and setting up.  Once configuration was complete, walked away from 
the box, then it dies several days later.

Comment 2 Alan Cox 2003-06-08 15:25:12 UTC

Does memtest86 (www.memtest86.com) show any problem on either of these systems ?
Does a current errata kernel help ?

Comment 3 Carter Borst 2003-06-09 13:53:24 UTC

I have upgraded a few systems (RH7.3 to start) immediately after install to all
of the most recent errata and fixes.

After the system is rebooted 2-3 times, this error shows up, but it doesn't show
up on all of them.

I have not attempted to use memtest86, but will run it in the near future.

Comment 4 damonp 2003-06-09 20:57:07 UTC

I found no errors with memtest86 running all of the tests.  I also ran 
PowerMax (it's a Maxtor HD) to test the HD.  No problems there either.  Is 
there any better HD test or maybe a disk controller test?  

Thanks

damonp

Comment 5 Alan Cox 2003-06-10 00:31:47 UTC

It doesnt feel like a disk controller error. The ram test is important because
that is a common problem but you seem to have passed that ok.

The next thing I'd like to do is try and eliminate power management and bios
events from the equation. If you can turn off APM/ACP power management in the
bios of a problem box, and also if it has it turn off USB legacy keyboard
support, save the CMOS settings then boot Linux and see if it dies in a few more
days time

Comment 6 Eduardo Gimeno 2003-06-11 13:25:42 UTC

I have had the same problem some weeks ago.

I had RH8.0 running on a server for some months (almost from the release o 
rh8.0). I had a small script running from crond each minute. One day I ssh'd to 
the server and I could see hundreds of executables used by the script in 
<defunct> state, because almost no command was able to be executed. I tried 
to "more /var/log/.." but ->segfault. Root fs was ext3. PC was PIII, IDE.

I rebooted, and the first error was "Cannot mount /proc"... many other errors, 
and when I received the prompt for password (for system maintenance) it didn't 
accept the password. I restarted the password from a boot CDROM, and when 
rebooted, I entered the password, but every command gave segfault 
(including "mount", which was what was causing the first error)


I made a backup of the currupt fs to another partition, reinstalled RH8.0 in 
the same partition after mke2fs, and everything worked fine until...

until I tried to execute "vi" from the corrupt partition. It corrupted 
everything in my new partition again, so I had to reinstall once again. It 
acted like a virus, as if my old system executables were tainted.

Any idea of the cause?

Eduardo Gimeno

Comment 7 Bugzilla owner 2004-09-30 15:40:20 UTC

Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.