67949 – badblocks incorrectly detected on raid autodetect partitions

Bug 67949 - badblocks incorrectly detected on raid autodetect partitions

Summary: badblocks incorrectly detected on raid autodetect partitions

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	e2fsprogs
Sub Component:
Version:	7.3
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Florian La Roche
QA Contact:	Brock Organ
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2002-07-04 18:17 UTC by Kevin R. Page
Modified:	2007-04-18 16:43 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2003-08-12 12:40:22 UTC
Embargoed:

Attachments	(Terms of Use)

Description Kevin R. Page 2002-07-04 18:17:32 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Win 9x 4.90; en-US; rv:1.0.0) Gecko/20020530

Description of problem:
I have a system with several partitions marked as type "software raid
autodetect" (type fd, afair). Doing a full install on this system of Redhat 7.3,
the installation fails after doing a badblock check on the constituent
partitions of the raid devices (see my problems in bug #66181), because
badblocks are detected - but always on the last few blocks of the partition (the
number of "bad" blocks goes up with the size of the partition).

I have found that if I boot into the rescue image, and use fdisk to change the
partition types FROM "raid autodetect" TO just "linux", the next time I run the
installer, no bad blocks are detected.

This includes if I then change the types back to "raid" at the partitioning stage.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Re-install Redhat 7.3 on a machine with partitions of type "raid autodetect"
2. During installation, leave the partitions set to "raid", but check for
badblocks and reformat

Actual Results:  Installation fails because badblocks are found. The detection
of badblocks is incorrect; but this causes further problems because the
installer crashes when badblocks are encountered (see bug #66181).

Expected Results:  Badblocks should not be detected. There is nothing physically
wrong with the disks. Installation should complete.

Additional info:

I have concluded that if the kernel boots with partitions set raid autodetect,
then it "protects" the last few blocks on that partition (I believe this is
where the raid superblock lives). This "protection" manifests itself as
badblocks during the installation check.

A work-around to install in this situation is to boot the rescue disk, and use
fdisk to change all "raid autodetect" partition types to just "linux". Then when
you install change the partitions back to raid, and install as usual (you can
even check for badblocks with no problem).

I _think_ this problem may exist in some form for RHL 7.2. If I run badblocks on
the constituent partition of a RAID1 device on another machine I maintain, I
also see badblocks on the last couple of blocks of the partition.

It _may_ be a Redhat kernel specific problem. Running the same test on another
RHL 7.2 box but with a stock 2.4.16 kernel does not highlight the problem.

For my system data, see the anaconda dump attached to bug #66181.

Comment 1 Michael Fulbright 2002-07-10 19:04:52 UTC

So you selected to do a 'bad blocks' check on the partitions when you created
them in disk druid?

Comment 2 Kevin R. Page 2002-07-10 19:21:57 UTC

Yes, I selected a badblocks check when I changed the partition type and
formatted in disk druid - so I get the symptoms of bug #66181.

HOWEVER, I'm pretty sure that badblocks shouldn't get detected at all (and thus
trip the bug #66181). I see the same detection "pattern" on a Redhat 7.2 box (up
and running, not during installation - see above).

I've just run a badblock check on one of the constituent partitions of a 5.1G
RAID 1 partition on the now running machine (described in the install above),
and I get a badblock turning up again, on the last but one block of the
partition (as expected from my first comment).

Comment 3 Scott Kaplan 2002-08-03 17:27:16 UTC

Your skepticism about bypassing the badblocks check is well-founded.

We did that.  We subsequently found that one of the two disks wasn't in the 
mirrored RAID array:

[root@1post root]# cat /proc/mdstat
Personalities : [raid1] 
read_ahead 1024 sectors
md3 : active raid1 hda7[0]
      513984 blocks [2/1] [U_]
      
md5 : active raid1 hda6[0]
      513984 blocks [2/1] [U_]
      
md4 : active raid1 hda5[0]
      513984 blocks [2/1] [U_]
      
md1 : active raid1 hda3[0]
      15649088 blocks [2/1] [U_]
      
md0 : active raid1 hda2[0]
      2048192 blocks [2/1] [U_]
      
md2 : active raid1 hda1[0]
      48064 blocks [2/1] [U_]
      
unused devices: <none>

Note that not all of the partitions have the badblock problem.  But all the 
partitions on /dev/hda are kicked out of the array.

Comment 4 Kevin R. Page 2002-08-05 22:49:20 UTC

scott:

Unfortunately, I'm more than familiar with the symptoms of hard drive failure in
a software RAID array (I seem to have had more than my fair share die over the
years). It does look as if your drive is broken (though it's always worth a
check with the manufacturers verification tool). If you also check through
/var/log/messages you should be able to find where the kernel hits the errors,
shuts down the drive and kick the partitions out of the RAID array.

However, in my case, I'm pretty sure the drive is not at fault. As detailed in
my original comment, the same partition badblock checks ok _unless_ the
partition type is raid autodetect. Indeed, if the partition isn't set to raid
auto at the start of the installation, I can successfully install _including_
the anaconda badblock check. /proc/mdstat on all the machines I see this problem
on show all partitions as up and active. Manufacturers diagnostic checks are all
good.

Comment 5 Florian La Roche 2003-08-12 12:40:22 UTC

Does not look like this is something that is caused by userspace e2fsprogs to
me. Closing this here.

Florian La Roche

Note You need to log in before you can comment on or make changes to this bug.