Bug 49117

Summary: Root filesystem corruption on Toshiba Tecra8100
Product: [Retired] Red Hat Linux Reporter: Alexandre Oliva <aoliva>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED CURRENTRELEASE QA Contact: Brock Organ <borgan>
Severity: high Docs Contact:
Priority: medium    
Version: 7.1CC: mhw
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-30 15:39:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/var/log/messages from crashed red hat 7.2 system none

Description Alexandre Oliva 2001-07-13 23:05:07 UTC
Every now and then, I find the behavior of my laptop machine would change,
without my having asked for it.  A while ago, it was the sound
configuration that had stopped to work.  After some investigation, I found
out /etc/esd.conf had become corrupt (attempts to access it would result in
an I/O error).  Removing the file and re-installing the esound package
fixed it.

A couple of days later, it was groff that become corrupt, and, after
struggling to remove some broken files and directories and re-installing
the groff package, I was able to view man-pages again.

A few days ago, the system stopped enabling eth0 when I connected the
PCMCIA card, and I had to enable it by hand every time.  It turned out that
/etc/hotplug/hotplug.functions had become an empty file, and net.agent was
inaccessible (I/O error).  Oh, and some doc files in the esound package
were corrupt again.

It doesn't seem like the disk is the problem (no messages in
/var/log/messages); in fact, forcing a fsck on boot fixes all of the I/O
errors, but the files end up being removed because they have invalid modes.

I have two theories.

It may be that the sound kernel module is the culprit, because I don't use
it very often, and corruption often comes up a few days after I use the
sound card to play something.  This theory is supported by the fact that,
oftentimes, if I play something one day, in general Ogg Vorbis files, I may
have to reboot the machine for the sound card to work again on the
following day.

The other possibility is that the blame is on apm.  Hybernating generally
works for me, but I often find garbage on the top of the screen when the
machine comes back up.  If it's corrupting video memory, it may well be
corrupting system memory too.  I suppose :-)

Sorry for the imprecise bug report and the number of open variables, but I
haven't been able to narrow it down any further.

The machine in question is a Toshiba Tecra 8100 with a Pentium III 700MHz
and 256 MB of memory.  Here's what I get when I modprobe ymfpci (the sound
card kernel module):

Jul 13 20:07:27 guarana kernel: PCI: Found IRQ 11 for device 00:0c.0
Jul 13 20:07:27 guarana kernel: PCI: The same IRQ used for device 00:05.2
Jul 13 20:07:27 guarana kernel: ymfpci: YMF744 at 0xefff8000 IRQ 11
Jul 13 20:07:27 guarana kernel: ac97_codec: AC97 Audio codec, id:
0x414b:0x4d05 (Unknown)

Comment 1 Alain Wenmaekers 2001-08-03 10:00:54 UTC
I also have a Tecra 8100 (a PIII 600 model).

I never had any corrupted files on it. I use it quite a lot (and this one too
plays music). It is the same soundcard (same id and everything).

My guess it is the APM. APM actually does NOT work here at all (and it did work
in RH7.0). When (for example) doing a apm --suspend... it does go in
suspend...but when starting the PC the bios says it could not return from
suspend mode...and then it boots normally.


Grtz

Comment 2 Mark Wilkinson 2001-10-31 14:45:25 UTC
I think I've recently experienced something similar: I have a Sony Vaio
(PCG-N505X) which I've upgraded with a 20Gb IBM-DJSA-220 disk. After running Red
Hat 7.1 with no problems for a while, the machine locked up solid on me last
Monday while I was using Nautilus. X hung completely, and when I connected it to
the network the PCMCIA card wasn't detected (I didn't hear the two beeps).

Rebooting (using the power switch) left me with lots of filesystem problems:
lots of things moved to /lost+found and rpm -Va showed lots of things missing.
Some files seemed to have had their content replaced with small GIFs, which
seemed odd, and lots of the throbber images from Nautilus had vanished.

Preceeding this I'd upgraded the kernel to the 2.4.9-6 release; I also upgraded
my Ximian Gnome packages as well just prior to last Monday. I have the package
list that was on the machine at the time if that's useful.

After fsck finished a number of the packages where corrupt, and I was a bit
concerned about the vulnerability of the filesystem, so I decided to backup what
was left, do a 7.2 install and pull the files I needed to migrate forward out of
the backup. I fancied trying out the new journalling stuff.

After a plain laptop install of 7.2 (no Ximian stuff) things seemed Ok, but this
morning I turned the machine on, logged into Gnome and the machine locked solid
again. I rebooted the machine and the filesystem seemed to check Ok, but
rc.sysinit failed and dropped me into the repair filesystem shell. The first
error message was "/etc/init.d/functions: not a directory", and it turns out
that /etc/rc.d/init.d has been turned into a 24x24 GIF of a rightward pointing
hand in a circle. Some other files have gone the same way: /usr/lib/perl5, for
example is now a 24x24 GIF of an exclamation mark on a red octagon.

Looking through /var/log/messages, it looks as though something was causing the
ymfpci driver to be loaded repeatedly: there are 12 occurences of the message
"YMF744 at 0xfedf8000 IRQ 9" (and the other stuff that goes with it) in the 50
seconds before the messages file ends.

I've since gone back to the messages file from the first crash (under 7.1 with
the newer kernel) and see similar evidence in /var/log/messages there too. Prior
to Oct 26 the YMF744 message would occur infrequently. On Oct 26 (when I updated
the Ximian Gnome packages, including the new Nautilus) it appears 28 times, and
on Oct 29 5 times in the minute before the machine locked up.

One common factor is that both 7.2 and 7.1 with the latest Ximian updates
include the recent set of patches to nautilus. Perhaps this is stressing the
sound subsystem somehow by forcing the module to be reloaded repeatedly, leading
to a lockup in the kernel.

Comment 3 Mark Wilkinson 2001-10-31 14:56:06 UTC
Created attachment 35841 [details]
/var/log/messages from crashed red hat 7.2 system

Comment 4 Bugzilla owner 2004-09-30 15:39:05 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/