Bug 89174

Summary: anaconda incorrectly detects bad blocks during installation
Product: [Retired] Red Hat Linux Reporter: Josh Willis <jwillis>
Component: e2fsprogsAssignee: Florian La Roche <laroche>
Status: CLOSED WORKSFORME QA Contact: Jay Turner <jturner>
Severity: medium Docs Contact:
Priority: medium    
Version: 9CC: edfriedmangvs, hugh, srevivo
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-06-03 12:43:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Josh Willis 2003-04-19 06:35:09 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225

Description of problem:
When performing a fresh install, I partitioned an existing ext3 filesystem to be
/, and selected it for the badblocks check with disk druid.  When it ran the
check, it got all the way to the end and then said bad blocks were detected, and
halted the install.

However, on looking at the partition through the rescue disk with both 'e2fsck
-c -v' and 'badblocks' directly, there were no bad blocks reported.  This was
true even when the non-destructive read/write test ('e2fsck -c -c -v') was
performed, and held true when using both the e2fsck from RHL 7.3, and that from
RHL 9 (1.32).  Thus I'm reasonably confident that there aren't really any bad
blocks on the disk.  

Tried this twice, and then just deselected the badblocks check in disk druid and
install proceeded.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Install RHL 9, selecting manual partitioning with Disk Druid
2. On partition selected for installation (/), choose 'check for bad blocks'
3. 
    

Actual Results:  At end of badblocks check, installation reported 'Bad blocks
found' and aborted the install

Expected Results:  No bad blocks should have been detected, and installation
should have proceeded after badblocks check.

Additional info:

Hard drive is an IBM DDRS-34560D (SCSI) and relevant partition is /dev/sda1.

Comment 1 Daniel Hammer 2003-04-30 21:20:08 UTC
'Had the same problem today with 2 completely new HDs; both were checked without
complains by RHL 8.0 and Mandrake

Comment 2 D. Hugh Redelmeier 2003-05-03 07:26:44 UTC
I've just experienced the same problem on two different machines with two
different disks.

Both machines were Athlons with recent VIA chipsets (Asus A7V333 and Gigabyte  
GA-7VAXP motherboards) and large WD hard disks (120G and 80G).

The rest of this report will be based on the Gigabyte system experience (because
that is what I have at hand as I type this).

I was replacing Phoebe on this large extended partition.  I told disk druid to
use it for /, mkfs the partition (not an upgrade), and check for bad blocks.

The installation GUI failed very near the end of checking with the message "Bad
blocks have been detected on device /dev/hda8. We do not recommend you use this
device. Press <Enter> to reboot your system.  [OK]"

On console 4, there is an interesting sequence of messages:
<6> EXT3 FS 2.4-0.9.19, 19 August 2002 on IDE0(3,8), internal journal
<6> EXT3-fs: mounted filesystem with ordered data mode.
<6> attempt to access beyond end of device
<6> 03:08: rw=0, want=8385900, limit=8385898

This suggests to me that the badblocks program (run by mke2fs) has a bug: it is
trying to access beyond the end of the partition.

The partition, according to fdisk, is 8385898+ blocks long (in other words,
16,771,797 sectors of 512 bytes).

Why are so few folks reporting this bug?  Perhaps it only appears when several
factors come together.

The partition is quite large: more than 2^32 bytes.

Maybe something is reading multiple blocks at once and hits a problem when there
are not enough blocks at the end.  Supporting this is what console 5 says at the
time of the error:
Checking for bad blocks in read-only mode
From block 0 to 8385898
Checking for bad blocks (read-only test): 838589696/ 8385898

Notice that the numerator of this last "fraction" is nonsensical.  I suspect
that the last two digits ("96") are duplicated due to some formatting bug.  Then
this block access looks to be within bounds as long as no more than 2(?) blocks
are read at once.

Comment 3 Michael Fulbright 2003-05-20 19:27:09 UTC
Refiling against mke2fs.

We are removing bad blocks check from the installation program as it was rarely
used and modern hardware  handles most correction silently and/or people use
RAID for critical data.

Comment 4 Florian La Roche 2003-06-03 12:43:34 UTC
For the report with access beyond the partition, please open a new bug report
if the partition code is in fact wrong and the end of the partition is not
accessable. This should go against either libparted or against anaconda.

For the other reports this sounds like kernel problems reading the drive
correctly, but does not look like problems in e2fsprogs. Please file those
again with hardware information if you have a reproducable test-case.

Thanks a lot for your reports,

Florian La Roche


Comment 5 Ed Friedman 2003-07-28 18:56:23 UTC
I have a situation in which I cannot install RedHat 9 onto an IDE hard drive due
to errors on the drive during the installation.  This computer does not contain
critical data, and no RAID is being used.  At this point, the only way I see to
try to get an install working is to install using RedHat 8 with check for bad
blocks, then boot from a Knoppix CD and erase everything in the partition that I
just formatted, and finally to install RedHat 9 without formatting that partition.  

If there is some simpler way to install RedHat 9, please let me know.  If this
is the only way to install it, then wouldn't it be simpler to keep "check for
bad blocks" as a workable option during installation for future releases?