Red Hat Bugzilla – Bug 67949
badblocks incorrectly detected on raid autodetect partitions
Last modified: 2007-04-18 12:43:52 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Win 9x 4.90; en-US; rv:1.0.0) Gecko/20020530
Description of problem:
I have a system with several partitions marked as type "software raid
autodetect" (type fd, afair). Doing a full install on this system of Redhat 7.3,
the installation fails after doing a badblock check on the constituent
partitions of the raid devices (see my problems in bug #66181), because
badblocks are detected - but always on the last few blocks of the partition (the
number of "bad" blocks goes up with the size of the partition).
I have found that if I boot into the rescue image, and use fdisk to change the
partition types FROM "raid autodetect" TO just "linux", the next time I run the
installer, no bad blocks are detected.
This includes if I then change the types back to "raid" at the partitioning stage.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Re-install Redhat 7.3 on a machine with partitions of type "raid autodetect"
2. During installation, leave the partitions set to "raid", but check for
badblocks and reformat
Actual Results: Installation fails because badblocks are found. The detection
of badblocks is incorrect; but this causes further problems because the
installer crashes when badblocks are encountered (see bug #66181).
Expected Results: Badblocks should not be detected. There is nothing physically
wrong with the disks. Installation should complete.
I have concluded that if the kernel boots with partitions set raid autodetect,
then it "protects" the last few blocks on that partition (I believe this is
where the raid superblock lives). This "protection" manifests itself as
badblocks during the installation check.
A work-around to install in this situation is to boot the rescue disk, and use
fdisk to change all "raid autodetect" partition types to just "linux". Then when
you install change the partitions back to raid, and install as usual (you can
even check for badblocks with no problem).
I _think_ this problem may exist in some form for RHL 7.2. If I run badblocks on
the constituent partition of a RAID1 device on another machine I maintain, I
also see badblocks on the last couple of blocks of the partition.
It _may_ be a Redhat kernel specific problem. Running the same test on another
RHL 7.2 box but with a stock 2.4.16 kernel does not highlight the problem.
For my system data, see the anaconda dump attached to bug #66181.
So you selected to do a 'bad blocks' check on the partitions when you created
them in disk druid?
Yes, I selected a badblocks check when I changed the partition type and
formatted in disk druid - so I get the symptoms of bug #66181.
HOWEVER, I'm pretty sure that badblocks shouldn't get detected at all (and thus
trip the bug #66181). I see the same detection "pattern" on a Redhat 7.2 box (up
and running, not during installation - see above).
I've just run a badblock check on one of the constituent partitions of a 5.1G
RAID 1 partition on the now running machine (described in the install above),
and I get a badblock turning up again, on the last but one block of the
partition (as expected from my first comment).
Your skepticism about bypassing the badblocks check is well-founded.
We did that. We subsequently found that one of the two disks wasn't in the
mirrored RAID array:
[root@1post root]# cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md3 : active raid1 hda7
513984 blocks [2/1] [U_]
md5 : active raid1 hda6
513984 blocks [2/1] [U_]
md4 : active raid1 hda5
513984 blocks [2/1] [U_]
md1 : active raid1 hda3
15649088 blocks [2/1] [U_]
md0 : active raid1 hda2
2048192 blocks [2/1] [U_]
md2 : active raid1 hda1
48064 blocks [2/1] [U_]
unused devices: <none>
Note that not all of the partitions have the badblock problem. But all the
partitions on /dev/hda are kicked out of the array.
Unfortunately, I'm more than familiar with the symptoms of hard drive failure in
a software RAID array (I seem to have had more than my fair share die over the
years). It does look as if your drive is broken (though it's always worth a
check with the manufacturers verification tool). If you also check through
/var/log/messages you should be able to find where the kernel hits the errors,
shuts down the drive and kick the partitions out of the RAID array.
However, in my case, I'm pretty sure the drive is not at fault. As detailed in
my original comment, the same partition badblock checks ok _unless_ the
partition type is raid autodetect. Indeed, if the partition isn't set to raid
auto at the start of the installation, I can successfully install _including_
the anaconda badblock check. /proc/mdstat on all the machines I see this problem
on show all partitions as up and active. Manufacturers diagnostic checks are all
Does not look like this is something that is caused by userspace e2fsprogs to
me. Closing this here.
Florian La Roche