From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7 Description of problem: I have a Primergy L100 with two harddisks /dev/hde and /dev/hdg (these are the only disks on that system and these are factory settings) (I have a setup similar to the one below on a Scaleo, but that one was not set up through the installation procedure but 'after the fact' once the /boot RAID broke.) I want to install these disks as follows: /dev/hde1 /dev/hde2 /dev/hde3 /dev/hde4 +----------+------------+------------+--------------+ /dev/hde ---> | /boot | md-RAID1 | md-RAID1 | md-RAID1 | +----------+------------+------------+--------------+ /dev/hdg1 /dev/hdg2 /dev/hdg3 /dev/hdg4 +----------+------------+------------+--------------+ /dev/hdg ---> | /boot2 | md-RAID1 | md-RAID1 | md-RAID1 | +----------+------------+------------+--------------+ +------------+------------+--------------+ forming: | /dev/md0 | /dev/md1 | /dev/md2 | +------------+------------+--------------+ +------------+------------+--------------+ onto which I put: | swap 1 | swap 2 | LVM | +------------+------------+--------------+ +--------------+ and onto that: | root fs etc. | +--------------+ That is, I do not want to have the /boot partition on an md device (I have had a few problems with that approach) but the rest I *do* want to have on a mirrored md device (incl. the swap area). How do I set this up? * When disk druid comes up, configure the four parts above as RAID1 devices on /dev/hde. Just 'force' the first part to be a primary partition. * Clone /dev/hde to /dev/hdg. * Modify the filesystem type of /dev/hd1 and /dev/hde2 to be 'ext3' from 'software RAID' and set the mountpoints. * Bind the other partitions into RAID devices and set up the filesystems. After that installation proceeds without any problem. However, on the first reboot, the partition table on /dev/hde is gone. /dev/hdg still has a valid partitioning. SO I suppose something messed up the MBR on /dev/hde. Grub? 'dd' of the MBR on both discs shows that the /dev/hde MBR is shifted by 16 NUL byte. Attached is a dump (blocks obtained with dd if=/dev/hde bs=512 count=1) I have tried to install twice with the same results. Installing w/o RAID devices on /dev/hde only works w/o any problem. Version-Release number of selected component (if applicable): Red Hat ES 4.0 Update 2 installation ISO images How reproducible: Always Steps to Reproduce: 1. Install as described above 2. Reboot Actual Results: /dev/hde no longer has a valid partition table. Expected Results: /dev/hde should have a valid partition table. Additional info:
Created attachment 121519 [details] Dumps of the MBRs of /dev/hde (bad) and /dev/hdg (good)
Created attachment 121520 [details] Bad MBR on /dev/hde
Created attachment 121521 [details] Good MBR on /dev/hdg
Uhhh...there might be something wrong with the disk. It's gone from the disk druid menu now. Maybe one should add SMART diagnostics to Anaconda? Will check hardware now.
This problem is getting uncanny, I did a few (7) tests for which the result was as described above, but now I have had a run of installations in which the problem has mysteriously gone away. I have however checked that it is not a *fault* in the hardware (it could be consistent error in the Promise RAID controller for example, which, even if the RAID mode is disabled, still controls the harddisks, which is why they are named /dev/hde and /dev/hdg): I checked both disks, moved disks to another machine of the same type, same problematic behaviour until a few hours ago. I know there *is* something up with the hardware as I have encountered some problems re-reading the MBR after writing it in fdisk. To reiterate on the case 'MBR on /dev/hde' has been destroyed: * Install as described above, using the RAID cloning feature to set up software RAID partitions on "/dev/hde" and "/dev/hdg", modify the partition type of the first two RAID partitions on each disk to 'ext3' to get a place to put /boot onto, bind the remaining partitions into RAID mirrors, format etc. Later repeats say that it is unimportant whether there is a swap partition or not, the problem occurs regardless. * Proceed with installation. No problem until reboot. * After reboot, nothing bootable can be found, i.e. /dev/hde has no longer a valid MBR. * Reboot machine using RH ES 4 installation CD. Disk druid now consistently says that "/dev/hde" is gone (looks like it isn't installed). "/dev/hdg" is visible and correctly partitioned. This "disk invisiblity" problem may or may not be related to the MBR problem, but I suspect some hardware weirdness: * On the console, fdisk *can* find "/dev/hde". The kernel can also find it: In /tmp/syslog, the last message about that drive is "hde: unknown partition table". * I tried to set up a new partitioning on "/dev/hde" with "fdisk", but when writing the partition table, I get "error 5: I/O error. Kernel still uses the old table, the new table wil be used at the next reboot". * I zeroed the disk using 'autoclave' from the Ultimate Boot CD but after a reboot into RH ES 4 installation CD, disk druid could still not see the disk. * I wrote some binary garbage into the MBR of that disk using PTS Disk Editor to check whether the MBR could be properly written. Works. (However, these DOS-based programs cannot be reliably started on the present hardware, as said there is something funny with it) After a reboot into RH ES 4 installation CD, disk druid could still not see the disk. * However, an installation aborted before 'package installation' reveals that the MBRs are correctly written on both disks. Other things tried: Installation as above, with md device cloning, but keep "/boot" installed on a software RAID mirror. After reboot, both disks are good but the machine can't actually boot (i.e. you get the grub command line). No real surprise. I will do more tests, but this begins to look like some unfathomable thing.
It's the hardware. There is probably something wrong with the on-board Promise RAID controller. (Promise Technology, Inc. PDC20265 (FastTrak100 Lite/Ultra100) (rev 02)) If I could solder it off, I would. The problem boils down to the fact that a simple 'reboot' of the machine won't work. After a reboot, harddisks cannot be properly read, i.e. /dev/hde seems to have no valid bootsector and even though /dev/hdg at least shows a valid partitioning, booting from it results in GRUB giving out unconventional characters then stopping after printing out "stage2". You actually have to "power cycle" the machine. After that, /dev/hde and /dev/hdg are both visible, booting works (though not off an md device) and the system comes up nicely. Disks have been set up as described on the 2005-11-27, with the /boot on /dev/hdg instead of /dev/hde to make sure a /boot is available. Sigh. It looks like the only interesting problem is: why does disk druid declare that the first harddisk does not exist, even if it does? Well, I guess you may close this bug.
Repeat query for resolving bug (NOTABUG)
Per final comment, and at reporter's suggestion, we're finally closing this nonbug.