Bug 102422 - Memory holes in root partition, found at boot up
Memory holes in root partition, found at boot up
Product: Red Hat Linux
Classification: Retired
Component: initscripts (Show other bugs)
athlon Linux
medium Severity medium
: ---
: ---
Assigned To: Bill Nottingham
Brock Organ
Depends On:
  Show dependency treegraph
Reported: 2003-08-14 18:35 EDT by Lee McKusick
Modified: 2014-03-16 22:38 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2003-08-14 19:01:33 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Lee McKusick 2003-08-14 18:35:59 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.2) Gecko/20030716

Description of problem:
I have been battling a boot time file system error bug that looks just like a
problem with /etc/rc.d/init.d/random. Hah... at the present state of my tests I
think I have a two bit error on one memory address.

So here is the story.

During bootup, after setting hostname, system stops with "/" root partition file

Running e2fsck /dev/hdb8 (my / partition) has returned several kinds of errors. 

The failure last night was:

/ contains a file system with errors
/ duplicate blocks found
/: File /var/lib/rpm/Basenames (inode #254766, mod time Wed Aug 13 10:31:25 2003)
has 1 duplicate block shared with 1 file:
/:    <filesystem metadata>

The mod time above is about the time I was running the Red Hat up2date  rpm
update tool.

A previous boot time / filesystem error was reported as 

"Holes" in various files, holes with a pattern of byte offsets that
look a lot like alternate superblock location... ~16k, 32768, 98304 etc.

E2fsck would report a bunch of holes in one file, then it would it would give an
error report, and then report a bunch of holes in another file

Specific files with damage repaired manually at boot time were:


Each file had holes reported at ~16k byte intervals.

I started searching for a malicious program running "dd ".

Nothing has changed in the dd entries of /var/log/messages over the past 3 months.

Using a find & grep against the entire hard disk, I can't find any plain text
malicious invocations of dd.

Pretty much, the only plaintext shell script running dd on my system is

My running copy of random passed the rpm signature verification and file
verification test provided for the initscripts package.

Manual runs of /etc/rc.d/init.d/random are all OK. 

The "stop" block invocation of "dd " seems to have the wrong out= and in= sense
so I edited the dd statement in the stop block.

I am still getting root "/" filesystem damage. 

Using the memory test on the lnx-bbc.org bootable business card, I detected a
consistent memory error of 2 bits at 590.6 mb. Two months ago I added 512 Meg of
memory to my existing 256 Meg.

The memory error is in the 256 Meg simm.

I pulled the simm memory module with the error out.

If the memory error is the source of the disk errors then I propose that Red Hat
add a memtest program to the Distribution. Red Hat should add a memory test to
the "troubleshooting flow chart". 

Version-Release number of selected component (if applicable):

How reproducible:
Couldn't Reproduce

Steps to Reproduce:
1.Execute /etc/rc.d/init.d/random stop ... then same with start
2.Shutdown system and run e2fsck on the / partition. /dev/hdb8 on my system.

Actual Results:  The random shell script works just fine.
It is hard to tell of the "/" partition is unmounted before random actually
writes to it at shutdown time.

Additional info:

How could two bad bits at the 590.6 mb point actually cause damage to the root
file system? 

I removed the bad SIMM, to see if that removes the root file system damage

The "holes" I saw at approx 16K byte intervals might be chunks of swapfile? Bits
10 and 9 set to zero in a 16 bit address?

How could I load dummy stuff into memory to put a known file right on top of the
bad bits in question?

The host computer is a 700 mhz Athlon, with 512Meg memory, 3 hard disks and 1
CD-R/RW, 1 floppy. The installation is a "workstation with a /home partition
that has been used for about 5 years now. 

I regularly use the up2date utility.
Uname -a is:

Linux familybox 2.4.20-19.8 #1 Tue Jul 15 14:59:09 EDT 2003 i686 athlon i386
Comment 1 Bill Nottingham 2003-08-14 19:01:33 EDT
Memtest is on current development rescue images.

There's nothing you can do in general to deal with bad memory; there are some
patches that allow you to map around some bad bits, but it's not 100%
guaranteed. This currently does appear to be a hardware problem for you, though.

Note You need to log in before you can comment on or make changes to this bug.