From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Win 9x 4.90; en-US; rv:1.0.0) Gecko/20020530 Description of problem: I have a system with several partitions marked as type "software raid autodetect" (type fd, afair). Doing a full install on this system of Redhat 7.3, the installation fails after doing a badblock check on the constituent partitions of the raid devices (see my problems in bug #66181), because badblocks are detected - but always on the last few blocks of the partition (the number of "bad" blocks goes up with the size of the partition). I have found that if I boot into the rescue image, and use fdisk to change the partition types FROM "raid autodetect" TO just "linux", the next time I run the installer, no bad blocks are detected. This includes if I then change the types back to "raid" at the partitioning stage. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Re-install Redhat 7.3 on a machine with partitions of type "raid autodetect" 2. During installation, leave the partitions set to "raid", but check for badblocks and reformat Actual Results: Installation fails because badblocks are found. The detection of badblocks is incorrect; but this causes further problems because the installer crashes when badblocks are encountered (see bug #66181). Expected Results: Badblocks should not be detected. There is nothing physically wrong with the disks. Installation should complete. Additional info: I have concluded that if the kernel boots with partitions set raid autodetect, then it "protects" the last few blocks on that partition (I believe this is where the raid superblock lives). This "protection" manifests itself as badblocks during the installation check. A work-around to install in this situation is to boot the rescue disk, and use fdisk to change all "raid autodetect" partition types to just "linux". Then when you install change the partitions back to raid, and install as usual (you can even check for badblocks with no problem). I _think_ this problem may exist in some form for RHL 7.2. If I run badblocks on the constituent partition of a RAID1 device on another machine I maintain, I also see badblocks on the last couple of blocks of the partition. It _may_ be a Redhat kernel specific problem. Running the same test on another RHL 7.2 box but with a stock 2.4.16 kernel does not highlight the problem. For my system data, see the anaconda dump attached to bug #66181.
So you selected to do a 'bad blocks' check on the partitions when you created them in disk druid?
Yes, I selected a badblocks check when I changed the partition type and formatted in disk druid - so I get the symptoms of bug #66181. HOWEVER, I'm pretty sure that badblocks shouldn't get detected at all (and thus trip the bug #66181). I see the same detection "pattern" on a Redhat 7.2 box (up and running, not during installation - see above). I've just run a badblock check on one of the constituent partitions of a 5.1G RAID 1 partition on the now running machine (described in the install above), and I get a badblock turning up again, on the last but one block of the partition (as expected from my first comment).
Your skepticism about bypassing the badblocks check is well-founded. We did that. We subsequently found that one of the two disks wasn't in the mirrored RAID array: [root@1post root]# cat /proc/mdstat Personalities : [raid1] read_ahead 1024 sectors md3 : active raid1 hda7[0] 513984 blocks [2/1] [U_] md5 : active raid1 hda6[0] 513984 blocks [2/1] [U_] md4 : active raid1 hda5[0] 513984 blocks [2/1] [U_] md1 : active raid1 hda3[0] 15649088 blocks [2/1] [U_] md0 : active raid1 hda2[0] 2048192 blocks [2/1] [U_] md2 : active raid1 hda1[0] 48064 blocks [2/1] [U_] unused devices: <none> Note that not all of the partitions have the badblock problem. But all the partitions on /dev/hda are kicked out of the array.
scott: Unfortunately, I'm more than familiar with the symptoms of hard drive failure in a software RAID array (I seem to have had more than my fair share die over the years). It does look as if your drive is broken (though it's always worth a check with the manufacturers verification tool). If you also check through /var/log/messages you should be able to find where the kernel hits the errors, shuts down the drive and kick the partitions out of the RAID array. However, in my case, I'm pretty sure the drive is not at fault. As detailed in my original comment, the same partition badblock checks ok _unless_ the partition type is raid autodetect. Indeed, if the partition isn't set to raid auto at the start of the installation, I can successfully install _including_ the anaconda badblock check. /proc/mdstat on all the machines I see this problem on show all partitions as up and active. Manufacturers diagnostic checks are all good.
Does not look like this is something that is caused by userspace e2fsprogs to me. Closing this here. Florian La Roche